Waifu Detection — When a weaboo meet Deep Learning

Wai….. whaaat?

If you are into anime, you may or may not familiar with this term : waifu. Waifu usually refer to the best or adorable girl from anime series. It was come from Japanese pronunciation of Wife, but since those weaboo persons are the one really adored Japan, then it become waifu as a term to refer their favorite anime girl. But it’s not like they really want to have a wife of 2-D girl tho (or it is?). But in the end, this just become a funny term to refer adorable anime girl.

Re:Zero kara Hajimeru Isekai Seikatsu

Ah, back to our topic, if you truly an Anime fans, you should now this awesome series called Re:Zero kara Hajimeru Isekai Seikatsu. Haven’t watch it? then you should do it, now! Especially when you really into fantasy, romance, comedy, thriller, time travel, and more importantly…. cute anime girl!

For those who have watched it already, you should know very well what most frustrating point within this series :

Best girl on series didn’t get what she deserve!!! Even after she is the one made most effort in the show.

If you are wondering, this is the female MC which good for nothing.

This is Emilia from re:zero, as female MC.

And this the best girl that do the most effort, but getting dumped instead.

Her name is REM, best girl in the show.

Since REM was not being noticed by main male MC on the show, let our Deep Learning Algorithm to notice her instead — she deserve it!.

First Problem : RAM or REM?

You know what? Rem actually is a twin, her sister name is Ram. How do we differentiate them? The easiest way is by their hair color.

Ram has red colored hair, while Rem’s blue.

Ah, a little background, actually I am a really new into DeepLearning. So I think this is a perfect usecase to learn for my veeery first DeepLearning project. I don’t want to spent my time to start with only MNIST dataset :p

So I think I want to create a classifier that can differentiate whether it is Ram or Rem in the picture. But…. to make it more interesting, data input will be in Grayscale!

Ok so i prepare my tools of war : TensorFlow, OpenCV, Jupyter-Notebook, and….

This Book

Soooo…. I end up spent my time to learn the concept of CNN and how to implement it using TensorFlow. Thanks to this, I could get a really good grasp on how to use TensorFlow framework and build my own architecture.

But you know what? 80% of my time washed-out by debugging TensorFlow instead of fine-tunning my model & architecture :(. Even save and read operation is another struggleness!

Struggle is real! Use keras instead :p

Ah, my code are available in this github. But this is a really messy one, lol. I 97.43% guarantee you wont understand it, what do you expect for a first project with a lot of debugging inside.

And the result?

Taraa! 83% Accuracy on test set. Not bad lah!

And just a reminder, data input actually is a 96×96 greyscale image like this :

This is Ram, but grey scale.

More importantly, what I get from learning a CNN architecture? Well the concept is simple enough, even you still can implement it in spreadsheet for a simple CNN architecture!

Second Problem : We want our waifu— Rem — to be noticed!

I get it! Now I want to try another Deep Learning-based computer vision problem called object detection. So your model can predict where your object of interest were placed within a picture or video! Isn’t it awesome? (at least for me). You can see this object detection in action :

Ahhh… I want it so bad to create it by my self.

And my new journey to find justice for Rem was once again, begin! Our tools of war for this problem? Object detection API from TensforFlow, Skicit-Video, and this 2 awesome blogpost: this one and this one (both are how-to tutorial for TensorFlow object_detection API).

Very first step of this fancy project: hand labeling one by one for every picture :”)!

I end up labeling ~136 image for test and ~40 image for validation.

The code is available on my github again, but how-to-use article might be on the separate article (I may want to create it in Bahasa since a lot of people already create a tutorial in English). But you can use this as a dataset since it was labeled already (by me).

Shorten the story, I have finished all of the training preparation to train our waifu model on top of TensorFlow (a lot of preparation process btw). Now lets train our model! — on CPU, sadly.

Total Loss After 4 hour training

I ran it on midnight, going to sleep, and woke up with ~1600 iteration already. Here the validation result :

Our Validation result. ~0.8 Precission/mAP
What a quite nice green box on Example Image 🙂

Aaaand.. Finally, here are the final result

You might notice there is one or two or.. more than two miss classification of the model, well, by only ~136 image for training, what do you expect? But somehow this perform really well even with very few data 🙂

Final words, Now even if our main MC didn’t want to notice Rem, but our machine learning model can notice Rem face within a video. Well deserve for the best girl! 🙂

— Ah, also a tutorial might be for next blog post, if some of you are really need to. But we could also take a cup of coffee by contact me :

Personal email : imam.arrr@gmail.com or my LinkedIn.

Source: Deep Learning on Medium