Source: Deep Learning on Medium
The purpose of the project is to create a dog breed classifier that can identify a dog’s breed given a picture of a dog. It can also tell if there is a human in the picture instead of dogs and also can tell what breed of dog will the human be which is really a fun part. The data set is one from sklearn and contains 8351 dog images in 133 categories.
For this classification problem I used four Convolutional Neural Networks. One is made from scratch. The other two were pretrained models called Res-Net50, VGG-16 and Xception. The metric for evaluating these model was accuracy. I got the highest accuracy for the Xception model.
The classification is done in Python 3. The python packages used are-
All these packages are available through pip install.
At first the dataset was loaded from sklearn. The dataset was then viewed by the available 133 dog breeds.
From the graph it is clear that the data is well balanced having around 50 images for each class of dog breed in average.
For the purpose of detecting human faces and coming up with a fun similarity with a dog breed 13233 human images were used. Human face detection was done using face_cascade from the cv2 package. cv2 or openCV is a great package for this type of analysis.
Then a human face detector function was created. The function detected 100% face in all human images and oddly detected 11% face in dog images.
A Res-Net50 model was then used to detect dogs in the photos. Then using preprocessing from keras the images were preprocessed. When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape
where nb_samples corresponds to the total number of images (or samples), and rows,columns,channels correspond to the number of rows, columns, and channels for each image, respectively.
The path_to_tensor function takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape
The path_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape
Three of the resulted preprocessed images were then viewed.
Then using Res-Net model a dog detector was built. This dog detector detected 100% dogs in dog images and 0% dogs in human images.
Then a CNN was built from scratch. For this purpose the images were preprocessed. For preprocessing, most common image preprocessing technique was used. Dividing the image arrays by 255 gave us the normalized images. This normalizing is a crucial part of any preprocessing for deep neural networks as neural networks tend to work better with normalized data.
The architecture for this CNN is as follows:
The layers used in the architecture are as follows:
- Max Pooling
10 epochs were used to train the model. The model was compiled using a rmsprop optimizer, loss function was categorical_crossentropy and metric was accuracy.
The test accuracy obtained was 8.1340% which is very low.
Then we used transfer learning to reduce training time without sacrificing accuracy. First we used a VGG-16 model. For the purpose of transfer learning we first obtained bottleneck features. Then the model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.
Using same compile configuration before, we got better result in our classifier. After 20 epochs I was able to obtain an 42.4641% accuracy on test set.
There were many options available for a further transfer learning. I used Xception. The model architecture is as follows:
Again I used same compile configuration but I used 100 epochs and the result was better than I expected. I got an accuracy of 85.4067%.
The performance of the pre-trained model built far exceeded the hand made CNN model. The accuracy of the Xception model (reached 85% while my CNN was about 7%.). This is because the pretrained model was trained using a larger dataset. Although a high accuracy was obtained, it is not perfect. This result can be improved further by augmenting dataset.