Source: Deep Learning on Medium
What if you need to detect a dog breed and you want to memorize more than a hundred dog breeds, and for fun what if you want to know which dog breed that resembles you.
Deep learning or neural networks is one of the most hot topics in data science today, and classifying objects in an image is one of the most famous problems that it solves efficiently.
In deep learning we have 2 options, either build our own model from scratch or use transfer learning.
Transfer learning means that we use pre-trained model, make use of the initial features it has been trained to identify like edges, some patterns (nose, tail, leg), and objects (cats, dogs, cars) then remove it last fully connected layers and add our own layer that classifies the objects we need to identify.
By this way, we save a lot of time and resources to trains a model for weeks on a very powerful machines with much GPUs.
To identify the dog breed, we have to detect if the image is for a dog or human, OpenCV and Haar feature-based cascade classifiers was used to detect human face and Resnet50 to identify dogs.
When building a model from scratch 6680 dog images were used for training, 835 images for validation and 836 for testing and following the next CNN model architecture:
Three convolutional layers with filters(16, 32, 64) respectively, each filter is 2×2 and padding is ‘same’ so not lose any features in the picture and the first layer of them is an input layer of shape (224x224x3) — activation function for each layer is ‘relu’ and no of strides 1 by default.
Each convolutional layer is followed by Max Pooling layer of size 2×2 to reduce dimentionality to the half.
Then add a dropout layer to drop 20% of the nodes used in training to reduce the overfitting.
After that, The GlableAveragePooling is used for pooling the features.
The last layer is the dense layer with 133 nodes as the number of categories to classify and with a softmax activation function.
After training the model, the testing accuracy was about 4% which was not so good.
Going for transfer learning, a Resnet50 pre-trained network was used and the following steps was performed:
Removing the last fully connected layers from Resnet50.
GlobalAveragePooling2D is used for pooling the features.
Dropout layer is added to reduce overfitting
Dense layer is add with 133 nodes and softmax activation function for classification.
After training the model, the testing accuracy was about 82% which is very good comparing to the model built from scratch.
For more information and source code reference please refer to my repository for this topic.
The model can be enhanced by more tweaking hyper-parameters such as (no. of epochs, dropout percentage, batch size), or doing more image pre-processing and augmentations and here is some output samples: