Original article was published on Deep Learning on Medium
Dog Breed Classifier — Udacity Data Scientist Nano degree blog
Currently, I am participating in a Nano Degree(ND) from Udacity named Data Scientist ND. As part of that ND, I need complete a capstone project , from where I chosen that Dog Breed Classifier as my capstone project. Being ML/DL enthusiast for last few years, I have not worked on a classification project where I have to classify more than 20 classes. For that particular reason, I found this Dog breed classification as a good one as this one has 133 classes or in other words 133 type of dogs.
The objective of this project is to to build a classifier that could first classify from a image whether it is a image of dog or a human and then, classify the detected dog into one of 133 dog breed categories (and in case of a human, what dog breed the detected human looks like).
The whole work will breakdown in the following steps:
- Step 0: Import Datasets and libraries in use
- Step 1: Detection of Humans
- Step 2: Detect of Dogs
- Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
- Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
- Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
- Step 6: Write your Algorithm
- Step 7: Test Your Algorithm
We will go over every step at a time in the following section.
Step 0: Import Datasets and libraries in use
8351 dog images has been split into training,validation and testing following 80:10:10 ratio. The images of dog are divided almost evenly in 133 dog breed categories while a dog breed having, on average, about 50 images with an Standard deviation of about 12. In the human data set, we have 13233 images for this project.
For accomplishing the project, popular libraries like scikit-learn, keras, numpy libraries has been used to data separating, model related work and pre and post processing purpose respectively.
Step 1: Detection of Humans
OpenCV’s implementation of Haar feature-based cascade classifiers is being used to detect human faces in images for this part. Firstly, the colored images has been converted to gray scale so that it’s easier to process those and later, cascade classifier is being used to point out the number of human faces present in the image. Finally, 100% of the images where a human was presented has been correctly identifying humans in . On contrary, 11% images have been incorrectly classified as human to be in those images, when there really wasn’t any.
Step 2: Detection of Dogs
Pre-trained ResNet-50 model from Keras, a tensorflow wrapper , has been to detect dogs in images. IN the ResNet-50 ImageNet weights is being used as pre-trained weights. This pre-trained model returns a prediction for the object that is contained in the image for a given image.
A string-valued file path to a color image as input and return a 4D tensor suitable for supplying to a Keras is being taken as a first step of pre-processing while later, the image has been resized to a square consist of 224×224 pixels. After that, pre-processing steps like converting the image to an array , which is then being converted to a 4D tensor, converting the RGB to BGR by reordering the channels have been done so that it fits the required size of ResNet-50. In the final step, these processed images is being feed to the pre-trained Resnet50 where it outputs the probabilities for a particular image from ImageNet category. As we are working on dogs, we checked for index ranging from 151–268(inclusive). The reason for doing is that these particular indexes are for dogs. We ended up with 100% accuracy(correctly identify the respective class) in both cases where there is dog or human in images.
Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
In this step, a a CNN from scratch to classify the dog breeds is needed to be created which should achieve an accuracy of 1% to the least. Accuracy was chosen as our performance metric as it is a classification problem where the data are almost evenly distributed.The whole point of building a CNN from scratch is to get the notion of the working process of CNN’s like, how the learning is being done ,the significance of different types of layers like convoluted and max-pooling layers, for example.
The hinted architecture was used as it is assumable from previous experience that does not matter how hard you try unless, you train on pretty complex model with ImageNet data, the accuracy will be poor. Hence, we hinted one is used and the accuracy bar which is set to at least 1%, has been crossed.
However, by adding a fully connected layer with 133 nodes to match the classes of dog breeds with a softmax activation function, an accuracy of 4% was gained.
Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
After drastic failure in the previous step which is normal (as it is from scratch and with few data), a pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model through what is called bottleneck features has been used. A global average pooling layer followed by a fully connected layer is being added, where the later one contains one node for each dog category and is accommodated with a softmax activation function.
This way of doing takes the liability of heavy processing as we are using those bottleneck features by reduces the waffling of our code and helps to gain an accuracy of around 36% with 20 epochs.
Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
In search of finding a better CNN , VGG-19, Resnet50, Inception and Xcpetion pre-trained models’ bottleneck features were given as options to choose from and the objective is to achieve an accuracy of 60% to the least.
Resnet50 model has been picked as an ideal candidate where a Global Max Pooling Layer,a fully connected layer followed by a dropout layer and finally a densed layer of 133 output have been added. By following the norm Softmax has been used as an activation function. As all of the above mentioned models, including ResNet-50, have been trained on a huge number of ImageNet images using large computational resources and known to be work best, using one of them look like a viable option for the problem we have rather than creating a CNN from scratch. On top of that, categorical cross-entropy ,sgd(stochastic gradian descent) have been used as a loss function and as optimizer respectively. After training with 20 epochs with batch size of 24, 81% of accuracy has been obtained.
Step 6: Write your Algorithm
The final steps in term of building is is to write an algorithm that can be use to predict the dog breed based on the provided image (dog or human). It accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,
- if a dog is detected in the image, return the predicted breed.
- if a human is detected in the image, return the resembling dog breed.
- if neither is detected in the image, provide an output that indicates an error.
This particular algorithm creates synergy among all the previously written functions ,detecting a human face, detecting a dog face, for example.
Step 7: Test Your Algorithm
In the last step a few sample images were tested to see the results and the it seems the classifier is doing a descent job on the given ones .
Though the pre-trained model did great by giving 85% accuracy, it is clearly visible from the training and validation accuracy is that model is overfitting. Using dropout of 40% did not help either which show the necessity of using batch normalization. Increasing the number of epochs might be an idea one might though but again, it won’t help until we are dealing with overfitting. Using regularization or making few layers trainable might be a good idea to try on while a complicated but effective idea would be using a grid search to find a suitable parameters. Lastly, increasing the amount of data can be a good idea depending on how well the model generalize.