intro to deep learning and dog breed recognition

Source: Deep Learning on Medium

A Simple Deep Neural Network on Keras to Predict 133 Dog Breeds

Before You Start Reading !!

If you have no idea what Artificial Neural Networks are, I recommend that you don’t read this article. This post is mainly targeting IT people who have a general background about Artificial Neural Networks. As we will see later, I will walk you through a simple example built with Keras (Keras is simplified framework for Artificial Neural Networks that is built on top of TensorFlow). The whole purpose of this exercise is to give you a taste of what it looks like (although this example is an over simplification) to build and train a Deep Learning model using Artificial Neural Networks.

Problem Statement

The problem we will be working on is to build a predictive model that can tell you which breed of dog an image has.

You may wounder “Why build such a model?”. I would reply to that “There are 200 to 350 breeds of dogs”. If you think humans should be sufficiently capable of recognizing all breeds, I would say “Not a chance”. For example, below is an image of an Alaskan Malamute and a Siberian Husky. They are strikingly similar, but they are also different. One might develop some expertise in recognizing the different breeds of dogs. However, doing that for 200 (to say the least) is certainly a machines job.

This is another example of a look-a-like breeds. Below are images of a Welsh Springer Spaniel and a Brittany dogs. Can you tell the difference.

Our Data … Split for Training, Validation, and Testing

Thankfully, we have plenty of data, split into three sets. One is for training and it’s the biggest one of the three. The other two are for validations and tests. I’m certainly not going to dive into the reasons why such a split is necessary. So, let peek into our data.

As you can see, our training set contains many folders, each is for a breed type. Inside such folders, we have many examples for the training of our prediction model.

As per the dataset I have, there are 133 different breeds. Accordingly, we certainly cannot cover all of breeds out there in the world. However, this is a good start as our model will develop the ability to recognize any new image for these 133 breeds.

Before I march forward and try to build a predictive model, lets take a look at the distribution of the data across the 133 breeds.

  • We can see that our training data for the different breeds of dogs is not fully balanced. For the Norwegian Buhund for example, we only have around 25 training examples while the Alaskan Malamute has 85 training examples.
  • This imbalance is not necessarily going to cause issues as we have plenty of data for each breed anyway. However, we should not be surprised if the model under performs when predicting those breeds that have lower number of training samples.

So far so good, let’s now jump to the code and see what we can do with the data we have.

Software Packages

Please, note that the entirety of this exercise will be conducted on a Jupyter Notebook. Moreover, multiple libraries will be utilized to load the data, define the models and train them, and test the final artifacts.

In this exercise, we will be using a plethora of tools. The code snippet below should give you a good idea about the number of tools we are going to use.

import pandas as pd
from sklearn.datasets import load_files

import numpy as np
from glob import glob
import random
import cv2
from tqdm import tqdm
from keras.utils import np_utils

from keras.applications.resnet50 import ResNet50,preprocess_input, decode_predictions
from keras.preprocessing import image
from keras.callbacks import ModelCheckpoint
from extract_bottleneck_features import *
from PIL import ImageFile
import matplotlib.pyplot as plt
%matplotlib inline

Loading Data

# define function to load train, test, and validation datasets
def load_dataset(path):
data = load_files(path)
dog_files = np.array(data[‘filenames’])
dog_targets = np_utils.to_categorical(np.array(data[‘target’]), 133)
return dog_files, dog_targets
# load train, test, and validation datasets
data_folder_path = ‘../../../data’
train_files, train_targets = load_dataset(data_folder_path+’/dog_images/train’)
valid_files, valid_targets = load_dataset(data_folder_path+’/dog_images/valid’)
test_files, test_targets = load_dataset(data_folder_path+’/dog_images/test’)
dog_names = [item[35:-1] for item in sorted(glob(data_folder_path+”/dog_images/train/*/”))]

Data Preprocessing

In order for us to be able to process the images, we first need to resize the images to an input size the models can take. Accordingly, we have the two functions below where a path to an image file is provided and the return value is a scaled tensor (multi-dimensional numerical matrix) containing the RGB pixel values for the image.

def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# Preprocessing: convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# Preprocessing: convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

Another important step is to scale the values stored in each pixel RGB values. As you know RGB values are between 0–255 inclusive. So, if we divide them by 255, we get a value in the range 0 to 1. This step is called scaling by normalization. There are other ways to scale your data (e.g. Standardization by Normal Distribution), but we will not cover them in this article.

ImageFile.LOAD_TRUNCATED_IMAGES = True# pre-process the data for Keras
# preprocessing here is merely scaling the data to 255 pixel value
train_tensors = paths_to_tensor(train_files).astype(‘float32’)/255
valid_tensors = paths_to_tensor(valid_files).astype(‘float32’)/255
test_tensors = paths_to_tensor(test_files).astype(‘float32’)/255

Building a Deep Neural Net from Scratch

After our data has been preprocessed properly, it’s time to build the architecture of the Deep Artificial Neural Network. We will use Keras for this step and create a Sequential object to contain the multiple layers of our deep neural net.

Note that the input of the first layer of the Sequential has an explicit input size of 224 x 224 pixels with 3 channels for the RGB color codes of each pixel.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import GlobalAveragePooling2D, Dense
def build_model():
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, activation=’relu’, input_shape=(224, 224, 3)))
model.add(Conv2D(filters=32, kernel_size=2, activation=’relu’))
model.add(Conv2D(filters=64, kernel_size=2, activation=’relu’))
model.add(Dense(units=133, activation=’softmax’))
return model

It’s also important to note that multiple layers we have are of different types (e.g. Conv2D, MaxPooling2D, GlobalAveragePooling2D, and Dense). These are standard components of Artificial Neural Networks that you can read about. You can think of them as Lego building blocks for the our Deep Network.

Now, lets build the model and peek into its summary. Thankfully, these are out-of-the-box function that Keras offers.

model = build_model()

Considering the fact that this deep net has lots of links between the perceptrons, it’s not very surprising to see that we have around 19000 parameters to train. Now, that how big a deep neural net and that’s why we need GPUs to train all of these parameters.

Training the Model

Will will will … our model is now almost ready for training. However, before we proceed with training, we need to compile our model.

model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])

So, what compiling the model does is that it tells the Keras framework “I want to use the rmsprop optimizer (Root Mean Square Propagation Optimizer)”, “I want my loss function for the optimization to be Categorical CrossEntropy”, and “ I want to use Accuracy as my metric of evaluating the models performance while it’s training”.

One more thing to prepare is to create a checkpointer for your model. The checkpointer will run with the optimizer in order to save the best model the optimizer finds every time the optimizer finds a better one.

checkpointer = ModelCheckpoint(filepath=’saved_models/', 
verbose=1, save_best_only=True)

Now it’s time to run the model training by calling the fit function on the model. notice that we pass the training tensors, training targets, validation tensors, validation tragets, along with a requested number of epochs, batch_size, and our checkpointer object.

epochs = 5, train_targets,
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Now, the model is training. You will see the accuracy on training data and validation data, along with the loss values calculated by the optimizer.

Testing the Model Performance

Once the training is over, we can reload the best version of the trained parameters from the file the checkpointer saved.


Awesome … we should be able to test our Deep Neural Net now. For that we use the test data by making predictions and comparing them to test targets.

dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print(‘Test accuracy: %.4f%%’ % test_accuracy)

What np.argmax is doing is that it’s taking the most likely dog breed the model has predicted. The result is then compared to the actual dog breed.

Test accuracy: 1.1962%

Unfortunately, our model is poorly performing. This is initial due to the fact that the training did not run for enought time. Remeber, we used 5 epochs only. Furthermore, even if we run the training for more epochs, it’s not gonna be very good either. This is due to the fact that the architecture we used is not good enough for the task.

In the next section, we will build a different architecture and try again.

Build a Deep Nerual Network using a Pretrained Xception Network

In this attempt, I will use a pretrained network for feature recognition. The pretrained network I will be using is Xception (read more here). Xception is one of many pretrained Deep Neural Networks that can be utilized for feature recognition in images. Other examples including (Xception, VGG16, VGG19, ResNet, ResNetV2, InceptionV3, Inception, ResNetV2, MobileNet, MobileNetV2, DenseNet, and NASNet). These pretrained neural networks are suitable for feature recognition without the need for further training. They are like large, prebuilt, robust, and powerful Lego blocks that one can use in any sort of application in Deep Neural Networks.

First, let’s load the pretrained Neural Net. Note that utilizing pretrained Neural Nets is called transfer learning.

### Obtain bottleneck features from another pre-trained CNN.
bottleneck_features = np.load(‘../../../data/bottleneck_features/DogXceptionData.npz’)
train_Xception = bottleneck_features[‘train’]
valid_Xception = bottleneck_features[‘valid’]
test_Xception = bottleneck_features[‘test’]

Upon loading the pretrained Neural Net above, we also load the train, validate, and test bottleneck features. I will not go into the details of what these are as it outside the scope of my article for now.

Once we have successfully loaded the pretrained Network, let’s now define our deep neual network.

Here I am simply creating a new Sequential model and adding two more layers to the pretrained Xception model.

Xception_model = Sequential()
Xception_model.add(Dense(133, activation=’softmax’))

Bingo … we now have 272,517 parameters to train. Please, note that these do not include the parameters of the pretrained Neural Network. God forbid, if we add these too, we will end up with a bheamouth that will take forever to train. Instead, we are just training the two aditional layers we added to our model. These two aditional layers set on top of the pretrained Neural Network and order ot provide the final output.

Training the Model

It’s time to compile our model now. I will still be using accuracy as the optimization metric although, in a real-life scenario, acccuracy is not a good metric and one should use something more representative of the overall perforamnce (say the F1 score for example).

Xception_model.compile(loss=’categorical_crossentropy’, optimizer=’rmsprop’, metrics=[‘accuracy’])

We should not forget the checkpointer to save our best model every time a new better model is found by the optimizer.

checkpointer = ModelCheckpoint(filepath=’saved_models/', 
verbose=1, save_best_only=True)

Now we proceed with training with 20 epochs. Note that you may want to use a GPU powered machine here as the training might take a rediculous amount of time if run on a regular CPU., train_targets, 
validation_data=(valid_Xception, valid_targets),
epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)

Testing the Model

Cool .. our model has been trained. Let’s load the best model and test it on the test dataset.

Xception_model.load_weights(‘saved_models/’)# get index of predicted dog breed for each image in test set
Xception_predictions = [np.argmax(Xception_model.predict(np.expand_dims(feature, axis=0))) for feature in test_Xception]
# report test accuracy
test_accuracy = 100*np.sum(np.array(Xception_predictions)==np.argmax(test_targets, axis=1))/len(Xception_predictions)
print(‘Test accuracy: %.4f%%’ % test_accuracy)

As you can see, the accuracy on the test data is 82%. This is excellent progress.

Test accuracy: 82.8947%

Further Refinements

There are two approached to further refine our model here.

  • Approach 1: make the training run for more epochs.
  • Appraoch 2: change/add/delete the layers that set on top of the pre-trained network in order to reach a better classification performance.
  • Approach 3: performe data augmentation on the training data such that we have more complex data the model can train for.

A Final Test: What can the model do?

For now, I will proceed with no further refinement and test my model on some test data I have.

def Xception_predict_breed(img_path):
# extract bottleneck features
bottleneck_feature = extract_Xception(path_to_tensor(img_path))
# obtain predicted vector
predicted_vector = Xception_model.predict(bottleneck_feature)
# return dog breed that is predicted by the model
return dog_names[np.argmax(predicted_vector)]

Let’s run this prediction function on some images we have.

for path in glob(“./images/*”):

Oh boy … it’s finally working. We can see that the performance is quite satisfactory. At least, I will not be googling things about dog breeds anymore because I got this predictive model to help me out.