How to Build a Dog Breed Classifier using CNN?

Original article was published on Deep Learning on Medium

How to Build a Dog Breed Classifier using CNN?

Who’s a good dog? Who likes ear scratches? Well, it seems those fancy deep neural networks don’t have all the answers. However, maybe they can answer that ubiquitous question we all ask when meeting a four-legged stranger: what kind of good pup is that?

It is not difficult to find dog breed pairs with minimal inter-class variation (for instance, Curly-Coated Retrievers and American Water Spaniels).

Curly-Coated Retrievers and American Water Spaniels

In this blog post, I’ll walk through the deep neural network classifier called Convolutional Neural Network (CNN) that is capable of identifying the dog breeds by using Dog Breed Dataset. I’ll explain how to train the Convolutional Neural Network (CNN) from scratch and get their accuracy to test the results on test data. I also used different transfer learning techniques to improve model accuracy and get better classifications.

Expected Output from Dog Breed Classifier

Steps To Be Followed

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 6: Write your Algorithm
  • Step 7: Test Your Algorithm

In this project, we have used Keras to build the Convolutional Neural Network (CNN) to classify the dog breeds. We have two different objects Human and Dog, and our model task is to classify the dog breeds into the test data. Ideally, we have to get the 90+% accuracy into the CNN model but the Udacity criteria are to get the accuracy up to 60% into the test data.

Step 0: Import Datasets

In this project, we have to download the datasets

Dog Breed Dataset

Human Faces Dataset

We have to build a load function that can load both datasets.

def load_dataset(path):
data = load_files(path)
dog_files = np.array(data['filenames'])
dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
return dog_files, dog_targets

To use this function, we have to load both datasets into our ipython notebook. We have to shuffle the dog breed data into the train, test, and validation frame.

# load train, test, and validation datasets
train_files, train_targets = load_dataset('../../../data/dog_images/train')
valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')
test_files, test_targets = load_dataset('../../../data/dog_images/test')

Step 1: Detect Humans

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on GitHub. We have downloaded one of these detectors and stored it in the haarcascades directory.

# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# returns "True" if face is detected in image stored at img_path

def face_detector(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
return len(faces) > 0

The output of the OpenCV features classifier looks like

Haar-cascade classification

Ideally, we would like 100% of human images with a detected face and 0% of dog images with a detected face. You will see that our algorithm falls short of this goal but still gives an acceptable performance. We extract the file paths for the first 100 images from each of the datasets and store them in the NumPy arrays human_files_short and dog_files_short.

human_files_short = human_files[:100]
dog_files_short = train_files[:100]
  • The percentage of the first 100 images in human_files have a detected human faces: 100.0 %
  • The percentage of the first 100 images in dog_files have a detected human faces: 11.0 %

Problem with given Datasets:

This algorithmic choice necessitates that we communicate to the user that we accept human images only when they provide a clear view of a face (otherwise, we risk having unnecessarily frustrated users!). In your opinion, is this a reasonable expectation to pose on the user? If not, can you think of a way to detect humans in images that does not necessitate an image with a clearly presented face?


To detect the objects in the images we have to use a machine learning-based approach with the help of Haar cascade. It helps us to detect the desired objects into the given image. In object detection, we have both positive(with desired faces) and negative images (without desired faces). In our case, the given data have two different objects like dogs and humans. we need to train our model to classify the particular object. The main approach is, if the face is detectable it’s human otherwise some other object.

Step 2: Detect Dogs

we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

from keras.applications.resnet50 import ResNet50

# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

Data Pre-processing

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with the shape

where nb_samples corresponds to the total number of images (or samples), and rows, columnsand channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor the function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have the shape

The paths_to_tensor the function takes a NumPy array of string-valued image paths as input and returns a 4D tensor with shape

Here, nb_samples is the number of samples, or a number of images, in the supplied array of image paths. It is best to think of nb_samples the number of 3D tensors (where each 3D tensor corresponds to a different image) in your dataset!

def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, you will create a CNN that classifies dog breeds. You must create your CNN from scratch (so, you can’t use transfer learning yet!), and you must attain a test accuracy of at least 1%.In Step 5 of this ipython notebook, you will have the opportunity to use transfer learning to create a CNN that attains greatly improved accuracy.

Model Architecture

Create a CNN to classify dog breed. At the end of your code cell block, summarize the layers of your model by executing the line:

Architecture of CNN

Question to Be Answer:

why did you think that CNN architecture should work well for the image classification task?

Convolutional Neural Network is handy while working on the image dataset. The reason behind, it generates the pattern to find the particular lines on the image and used multiple dense layers to clearly detect the objects. By using the max-pooling to reduce the spatial dimension from 224 to 28.

  • Used 3 layers of Convolutional + alternative 3 MaxPooling layers. Initially in my architecture, we didn’t add any Dropout layers, and the model was giving accuracy much less than 1%.
  • Activation “relu” was used for every convolutional and dense layer except the last Dense layer where we used “Softmax” activation as we wanted classification of different breeds. And kept output layers to be equal to total dog breeds to be classified.
  • For the parameter ‘Padding’ kept the value as ‘same. Since we didn’t want to lose the values when the convo-window is sliding outside the image matrix.
  • And in the compile method, we used ‘categorical_crossentropy’ as a loss function.
model = Sequential()

### TODO: Define your architecture.
model.add(Conv2D(filters=16, kernel_size=2, padding='same', activation='relu', input_shape=(224, 224, 3)))
model.add(Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(Conv2D(filters=64, kernel_size=2, padding='same', activation='relu'))
# model.add(Flatten())
# model.add(Dense(500, activation='relu'))
model.add(Dense(133, activation='softmax'))


Compile the Model

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Train the Model, train_targets, 
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Test the Model

# get index of predicted dog breed for each image in test set
dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

Step 4: Use a CNN to Classify Dog Breeds

To reduce training time without sacrificing accuracy, we show you how to train a CNN using transfer learning.

Obtain Bottleneck Features

bottleneck_features =np.load('bottleneck_features/DogVGG16Data.npz')
train_VGG16 = bottleneck_features['train']
valid_VGG16 = bottleneck_features['valid']
test_VGG16 = bottleneck_features['test']

Model Architecture

The model uses the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

VGG16_model = Sequential() VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:])) VGG16_model.add(Dense(133, activation='softmax')) VGG16_model.summary()

I’ve mentioned earlier, after defining the model we have to compile the model and then train the model on given features. Afterward, our main task is to test the model accuracy in the test data. And, the above VGG-16 model test accuracy is about 41%. We have noticed that the test accuracy is too low, we need to test different pre-trained models using transfer learning and get the model having max test accuracy.

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

In Step 4, we used transfer learning to create a CNN using VGG-16 bottleneck features. In this section, you must use the bottleneck features from a different pre-trained model. To make things easier for you, we have pre-computed the features for all of the networks that are currently available in Keras:

The files are encoded as such:


where {network}, in the above filename, can be one of VGG19, Resnet50, InceptionV3, or Xception. Pick one of the above architectures, download the corresponding bottleneck features, and store the downloaded file in the bottleneck_features/ the folder in the repository.


when you get the Bottle_neck feature, then the next step is to create a CNN model using your desired Bottle_neck feature and compile them. After compilation, you have to train the model and test its accuracy. To improve the model performance you have to tune their hyperparameters and repeat these steps until you get your desired accuracy.

Resnet50_model = Sequential()
Resnet50_model.add(Dense(133, activation='softmax'))


We have to implement the Resnet50 pre-trained model and defined their features after repeating the steps as above, after training the model we have to get the model accuracy approx 82% without tune the hyperparameter.

Step 6: Write your Algorithm

Write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide an output that indicates an error.
def dog_breed_detector(img_path):
img = cv2.imread(img_path)

#convert BGR image to RBG
cv2_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

#Display the box with bounding box

if dog_detector(img_path) == True:
if face_detector(img_path) == True:
return print("Human and Dog both is detected, but dog is breed ", Resnet50_predict_breed(img_path))
return print("Dog is detected and breed is ", Resnet50_predict_breed(img_path))
elif face_detector(img_path) == True:
return print("Human is detected and breed is ", Resnet50_predict_breed(img_path))
return print("Error in recognising the breed")

Step 7: Test Your Algorithm

Question to Be Answer:

How to improve model performance and accuracy?

To improve model accuracy:

  • Perform Data Augmentation
  • Tune the Hyperparameters to find the best-fit model
  • Try other pre-trained models
The dog is detected and the breed is ages/train/096.Labrador_retriever
Human and Dog both are detected, but a dog breed is ages/train/060.Dogue_de_bordeaux


we’ve tried to build a CNN to classify the dog breeds. With the help of Keras, it’s easy for us to not only make the CNN model but also test different transfer learning pre-trained models to test our model performance and get better accuracy to classify our objects. In this, blog post we tried CNN from scratch and have to use VGG-16 and Resnet50 transfer learning models to test their accuracy into our test data. And, we also get better accuracy about 80%. We also discussed “how to improve our model performance ”, for further details you have to check my GitHub repository

How to Build a Dog Breed Classifier using CNN from Scratch and Transfer Learning Techniques?