How not to freak out near banana stand and win 250 Euro? Part 1.

Source: Deep Learning on Medium

This article is inspired by particular (potentially tabloid) news stories, but it uses it mostly as a premise to build a simple deep learning model in PyTorch. It also features some images of (huge) spiders, so if you have arachnophobia, consider yourself warned.

At least a few times a year you may hear or read about unexpected guests in grocery stores, usually somewhere near banana stands. It may be an unpleasant encounter with infamous Brazilian wandering spider, known for its aggressiveness and toxic venom. In its natural habitat, this species dwells near banana trees, so it can accidentally get to your local store with one of the banana shipments.

So when people see suspicious eight-legged creature in supermarket, they may have reasons to be disturbed. This may end with calling the police, firefighters or using other extreme measures.

When you spot a venomous spider…

However, human perception is not perfect, especially, when its fear-driven. It may be a false-positive scenario, that is, spider suspected to be Brazilian wandering may be just regular, domestic spider. For example, giant house spider looks relatively similar — it may appear to be scary, but in fact it is harmless. In this case, alarming the police or fire department will be an overkill

This situation happened — someone mistaken giant house spider (left, source) for Brazilian wandering spider (right, source), then called the Police and sanitary inspection (link at the end of this article)

There is more — in 2014 one of the German tabloids offered 250 Euro for capturing a photo of Brazilian wandering spider. So when you correctly identify it, you can act accordingly — stay away from it, call appropriate forces, try to win the prize (when it is a venomous spider) or simply avoid potential panic and raising the alarm (when it’s a regular one). Luckily, we can actually do it with some help from neural networks.

Deep learning for the win

Our task is simple binary classification — having images belonging to two groups (classes) we’d like to assign to each image appropriate label. In contrast to typical image classification tasks (e.g., dogs vs. cats), our classes will represent fairly similar objects — two spider species: Brazilian wandering spider (BWS) and giant domestic spider (GDS). This kind of task is referenced in the literature as a fine-grained classification. It can be sometimes cumbersome for deep learning algorithms due to visual similarity between instances of each class or expert-level skills required to label images correctly. Another problem may be a relatively small sample size. Having said that, let’s try to build our own fine-grained classifier.

The data

Probably most tedious part of this task was creating custom data set. I had to get enough images for each class to train the classifier. I’d like to spare boring details here — for tutorial specific for such task; please check the link on the end of this article. In short, the whole process consisted of three steps:

  1. Downloading samples from Google Images search results and annotating them, if necessary.
  2. Adding some randomly selected frames from videos featuring our spiders (to make the data set little bigger)
  3. Manual cleaning — dropping corrupted or irrelevant images files (memes, hand-drawings, non-spiders, etc.)

The final data set consisted of 1887 images (837 for BWS, 1050 for GDS). For training, I used 80% of each class instances. I arranged the data in a way assumed by PyTorch’s ImageFolder, that is:

#class 1
#class 2

Having data prepared like this, we can load them and perform some transformations on them. It will make our data more robust and help achieve a better score at the end. This includes:

  • resizing
  • cropping
  • flipping
  • normalization of color channels — we’ll need this for pre-trained network

Hopefully, PyTorch allows to do it with some few lines of code. It goes like this:

Having done that, we may proceed to the modeling part.

The model

I knew that the quality and size of my data set is far from perfection and expected that the neural network trained from scratch might have some problems. Hence I decided to use pre-trained one — VGG19 in this case, trained on ImageNet data. Among dozens of classes it can recognize, there are some examples of spiders. We can hopefully assume that networks 
pre-trained on ImageNet already knows how to recognize spiders on pictures (or at least spider-specific shapes). To be more specific, we can download from PyTorch pre-trained neural network and simply plug our own classifier on top of it.

Ok, our model is now ready for the training. This process in PyTorch is slightly more verbose than in high-level TensorFlow’s API — Keras:

Let’s run it — 5 epochs for a start…

#...and train it                
train_model(epochs = 5)

If we did everything right, we should see how our model becomes more accurate with each epoch — that is, the loss decreases, while the score increases.

Loss (left) starts from considerable value, but quickly gets below 1.0 — I had to trim y-axis to actually show these differences

So our trained model can distinguish between Brazilian wandering spider and giant house spider in ~0.90% of cases. This score is quite promising, especially when we consider that our data set was not very big nor exceptionally clean.

Updates from last epoch — not bad for the model with generic parameters

Next steps

It’ll be nice to examine why our model assigned the respective label to each image. To be more specific — it’s quite probable that pictures of Brazilian wandering spider may be taken in a different setting than pictures giant house spider (jungle vs. household). In such scenario, the model may learn to differentiate between each spider species not precisely because of the how spider looks, but rather because of the background of the picture (e.g., a spider has a forest-like background, so it appears to be BWS). Our goal is, however, to correctly classify each spider by its actual appearance, not its background or environment.

This is baseline-version of our model — I used pretty much generic parameters, which may still be fine-tuned. I chose accuracy as a default success metric, but given that some cases of misclassification may be more painful than others (e.g., dangerous spider classified as harmless), different metrics or loss functions may be worth trying. It may also be a good idea to implement some techniques specific for fine-grained classification tasks. All these potential improvements can make our model more robust. We’ll try to implement them in part 2 of this article, so see you there!

Useful links

Some context

Technical stuff

Code for model