Source: Deep Learning on Medium
This article has some code and graphs but is not meant to be highly technical.
One of Kaggle’s machine learning competitions is trying to predict the amount of time it will take for a given pet to be adopted. It’s sponsored by PetFinder.my
They give you a ton of useful data, like how old the animal is, whether it’s a cat or dog, it’s location, color, etc etc.
There are also adoption photos for many of the pets, and included in the data are a bunch of features extracted from google’s vision API for each of the images.
Our own Neural Network
The general strategy for the kaggle competition seems to combine all the tabular data with the features from the images (as well as descriptions) and once it’s all structured together run it through some model.
I was curious if we could predict adoption speed from the photos themselves. That is, without knowing anything about the age of the animal, where it is, how much adoption fees are, etc, can we predict how long it will take an animal to be adopted simply from the image associated with the animal?
Intuitively, this seems like an unsolvable problem, and it definitely is silly to try and solve without all the additional information, but the goal was to extract the features from this network and combine it with the tabular data provided by kaggle.
For those curious, the work is contained in this python jupyter notebook: https://github.com/gdoteof/neuralnet_stuff/blob/master/adoption_pictures_neural_nets.ipynb
So what did it find?
At first we were doing a little bit better than chance, which is kind of what was expected. We are getting the answer right about 1/3rd of the time and getting .2 on a quadratic kappa score. But it’s not really getting any better, at least not quickly enough. (we want the blue line to go down)
This network is pre-trained using ImageNet, and knows how to differentiate between cats and dogs (and many other things), but the subtleties of knowing that a dog is or cat is super primed for getting adopted, or unlikely to ever be picked up, is nothing close to what ImageNet was trained for.
So, we allow the network to learn the deeper parts of its structure, hoping it will specialize in these types of images, ruining its ability to understand all the ImageNet categories.
This is much better! Our error rate is going down ever so slightly, but our kappa score is improving significantly, meaning the errors our model was making are at least getting smaller. However at the end it looks like it is flattening out, and our validation is doing better than our training, so we should be able to go deeper.
Tweaking it a bit to look for more subtleties:
This is great. Our kappa score skyrockets, actual error rate drops significantly and we are starting to overfit finally. Our network definitely has learned *something*.
The Good The Bad and The Ugly
So we trained the network to infer adoption speed from a photo. Specifically one of these categories:
0 — Pet was adopted on the same day as it was listed.
1 — Pet was adopted between 1 and 7 days (1st week) after being listed.
2 — Pet was adopted between 8 and 30 days (1st month) after being listed.
3 — Pet was adopted between 31 and 90 days (2nd & 3rd month) after being listed.
4 — No adoption after 100 days of being listed. (There are no pets in this dataset that waited between 90 and 100 days).
Taking a look at the confusion matrix, we can see our network guessed only a single photo was class 0. It was also correct.
Meaning, of all the images the network saw (over 10k) it only thought one of them was of an animal that would be adopted the same day it was listed.
I can’t even tell you how I excited I was when I found this. So, without further adieu: