Source: Deep Learning on Medium
Dog Breed Classification Using Convolutional Neural Networks
An evaluation of self-made CNN architecture, VGG16 and Renset50 on dog images to classify which breed they belong to
So I have been trying to finish the Capstone project of my data science nanodegree from Udacity. I wanted to learn basics of convolutional neural networks based on their reputation of working so well on image data because I am aiming to complete my Master thesis in a similar area.
The project aims to predict 133 different dog breeds using labelled dog images just like humans. Honestly speaking, even most humans cannot identify the very minor differences in the breeds. If our algorithm is trained first, just like humans, it will be able to detect the dog breed fairly reasonably. I know some people might be pissed that I am using the analogy of human learning to a CNN but the idea is same, it is just that the way of training is very different. CNN learns on 2D image data using mathematics while humans learn by repeating the same thing again and again. And of course humans are self-aware.
The algorithm requires a dog image as input and outputs the breed of the dog. If a human image is given, the algorithm returns the dog name that most resembles to that human. The training of algorithm was done on GPU provided by the course staff which took significantly less time to train the model. But if you are out of GPU, I also uploaded the trained weights in my GitHub repository (Link provided at the end). The script contains additional information regarding this.
The project was finished in 5 steps, each step contributing to the final goal of predicting the dog’s breed provided the image. The project used 8351 dog images with different breeds.
Step 1 — Human Face Detection:
This is a helper function that will be used to identify human face in image. The project uses opencv’s haarcascade xml for detecting face. The performance of this function was also tested, it works really good on humans but a slight error in reading dog faces as humans.
Step 2 — Dog Face Detection:
This step is done using pre trained ResNet-50 model. The function outputs true or false depending on whether dog was detected in the image.
Step 3 — Creating CNN from Scratch:
This is where things get really excited. I prepared a CNN architecture from scratch. The idea was simple, stacking up 3 convolutional layers after input layer and adding max pooling layer in between. Let’s look at the graphic for clarification:
I used Keras sequential module to build the model using categorical cross-entropy loss for punishing/awarding the weights and accuracy as metric. Checkpoints were created so I don’t have to train it over and over again. I initially used 5 epochs which was giving me around 1% accuracy, I increased it to 10 and was getting 4% accuracy on test set.
Step 4 — Transfer learning using Pre-trained VGG16 CNN:
VGG16 has 16 layers with various combinations of convolutional layers. I added global average pooling layer to shrink it down to only 1 value per feature and then used softmax activations on a fully connected layer to classify dog breeds. I ran the model for 20 epochs. Validation loss was 8.7 and accuracy was 39%. The model worked fine with 40% accuracy on test set (probably due to small training dataset). Better than my model from scratch but still needs significant improvement.
Step 5 — Transfer learning using Pre-trained ResNet-50:
Next I did the same thing with ResNet-50 and got accuracy of 80% on the test set which is pretty great considering small dataset. Validation loss was significantly reduced to 0.98 and 83% accuracy. Definitely this is the model of choice for me currently but I strive to improve it further in the future. Some results from ResNet50 are shown:
This was my first project in deep learning and I was lucky to have GPU access through Udacity which significantly increased my learning curve but I am still in the learning phase so any recommendations/ feedbacks are welcome.
My github repository: https://github.com/Hariss096/dog_breed_classification