Fast.ai Season 1 Episode 2.2 — “DOG BREED CLASSIFICATION”



Welcome to the 2nd part of Episode 2 where we will take on Dog Breed Classification problem. We have 120 breeds of dog images which we have to classify . Before we start , I would like to thank Jeremy Howard and Rachel Thomas for their efforts to democratize AI. Thanks to the awesome fast.ai community .

For those who haven’t seen previous episode , please click here to check out Episode 2.1 .To save you some time I’ll quickly go through the last episode here . Below are the steps we followed to build a State of the Art CLASSIFIER:-

  1. ENABLE DATA AUGMENTATION AND SET PRECOMPUTE = TRUE.
  2. USE lr_find() TO FIND THE HIGHEST LEARNING RATE WHERE LOSS IS STILL CLEARLY IMPROVING
  3. TRAIN LAST LAYER FROM PRECOMPUTED ACTIVATIONS FOR COUPLE OF EPOCHS.
  4. TRAIN LAST LAYER WITH DATA AUGMENTATION (i.e PRECOMPUTE=FALSE) FOR 2–3 EPOCH WITH CYCLE_LEN=1.
  5. UNFREEZE ALL THE LAYERS.
  6. SET EARLIER LAYERS TO LOWER LEARNING RATE THAN NEXT HIGHER LAYERS AND TRAIN IT.
  7. USE lr_find() AGAIN.
  8. TRAIN FULL NETWORK WITH cycle_mult=2 UNTILL OVERFITTING.

So in this Blog post we will deal with Dog Breed Identification. The link to this kaggle dataset is present here . The main aim is to classify 120 breeds of dog images.

1.DOWNLOAD THE DATA AND IMPORT THE PACKAGES THAT WILL BE USED .

So to get started we have to download the data .To easily get this data we will make use of kaggle api . To know more about this API, please check out here . But the bottom line is

  • Install the kaggle api
  • Import it
  • And use it to download the data to the path you want to. The steps mentioned has been performed in the snapshot below.
Downloading data done right

Moving forward lets import all the required packages :-

The glob package will match the files using the patterns the shell uses. In the below code we can see the glob helps to get all the files in the path mentioned . The output is mentioned from line #7 onwards in the below gist.

2. CHECK FOR GPU AVAILABILITY AND UNZIP THE DOWNLOADED FILES

To make sure the GPU is available to you , run the following commands. These should return true.

Cuda done right

After downloading all the files , the following code helps to unzip those .

unzipppp

3. GET FAMILIAR WITH THE DATA USING PANDAS

import PANDAS as 🐼

Now lets inspect the data. After unzipping the train and test zip files we will get to know it has the images of 120 breed of dogs. Sample_submission.csv files tells us the content of the file which the competition expects during submission. Lets see the content of labels.csv file.

It contains the image_id of all images of the dog and the breed or labels the dogs belong to. labels.csv has the breeds for the images in the training and test dataset. This workaround makes life easier .

Lets check how many dogs are there in a particular breed.

The output above shows 120 dog breeds and count of the number of images corresponding to each of the breeds in descending order. Sorry, I could not accommodate all the breeds in one snapshot.

Generally we have a train, validation and test dataset. We train our model on training dataset and predict it on validation dataset simultaneously . This involves parameter tuning to increase the accuracy on the validation dataset. Finally when we are convinced that our model is fine we use it to predict on the unseen dataset i.e test dataset. This process helps in preventing overfitting.

vaidation dataset done right

The line #1 above sets the path to the label.csv file. Line #2 opens the file and counts the number of rows the dataset has except the header, hence minus 1. That gives us the number of rows or number of images in the csv file. get_cv_idxs(n) in line #3 will return random 20% of the data that will be used as validation dataset. This returns the indices of files that we are going to use as validation dataset. Lets crosscheck this one .

!!! Seems legit !!! Validation dataset size is indeed 20% of Total dataset Size.

Now we will use the pre-trained resnext_101_64 architecture to build our model .

The moment when I’m writing this blog the weights of pretrained resnext_101_64 architecture isn’t present in fastai library so we have to download it to this location '/usr/local/lib/python3.6/dist-packages/fastai/weights' and then run our model or else it throws an error saying weights not found . Follow the steps explained for the code below .

Steps to get the pretrained model weights:-

  • download the pretrained model to any location using the link as mentioned in (Line #3).
  • move it to the above said location .(Line #20)
  • Make the above said location as your current directory.(Line #23)
  • And unzip the file .(Line #27)
  • Go back to the location where you have the data.(Line #41)
resnext101 weights

Note :- In future , if any fastai code update version happens this above steps might be taken care of and hence this step might be optional.

Before proceeding further, decide what should be the size of the images, what architecture to use and what batch size to consider.

4.SETTING UP THE DATA AS PER FASTAI FORMAT

To setup the data as per fastai format we write the following code . Note that earlier we used to do ImageClassfierData.from_paths() for the Dog vs Cat classifier as we had data specified in separate folders. In that case the names of the folders were the names of the labels .

But here we have the data(images) present in the train and test folders as well as the filenames are summarized in the labels.csv file .The labels.csv file has the label /breed for each image present in train and test dataset, so we go for ImageClassfierData.from_csv(...) as shown below .

PARAMETERS FOR ImageClassfierData.from_csv(...) ARE:

  • PATHis a root path of the data (used for storing trained models, precomputed values, etc) .Also contains all of the data.
  • 'train' — the folder that contains the training data.
  • labels.csv file has the labels for different dog images.
  • val_idxs has the validation data . It indicates the index number in labels.csv that has been put into the validation dataset .
  • test_name='test' is the test dataset.
  • The file names actually have a .jpg at the end which is not mentioned in the labels.csv file hence we have suffix=’.jpg’ . This will add .jpg to the end of file names.
  • tfms is the transformation we are going to apply for data augmentation.

The data object has been created above. Using the data object we can check for train_ds(train dataset) . To know what else can be accessed using data object write data. and press tab . A dropdown menu will appear showing the attributes of data object. fnames mentioned below tells us about the filenames present in training dataset.

Lets check whether the images are located in correct place:-

The output of image is as shown below. As we can see the image of the dog is taking most of the frame , hence not to worry for cropping or zooming techniques during the transformation(tfms) phase.

Line #6 below maps the file name to size of the file and stores it in size_d .size_d is a dictionary where the key are the file names and the values are the dimension of each file. Line #8 has the zip(*) command, which helps in unzipping the rows and columns and saves it in row_sz and col_sz .

The size of the image is :-

Check out the size of train dataset and test dataset in line # 1 below and the number of data classes/dog breedsand what are the first five breed of dogs in line #5.

Before moving in further lets check whats the dimension of data we are modelling with .This is just for inspection purpose. Hence :-

In the histogram above we can see that we have 5000 images with dimensions around 500 pixels and few images bigger than 1000 pixels. In the histogram below we are checking only images with dimensions below 1000 pixels. It also spits out the how many number of images are there with a particular dimensions.

5. BUILDING UP A STATE OF THE ART CLASSIFIER .

The wait is over . Finally presenting before you The State of the Art Classifier.

There are couple of steps to follow , for a State of the Art Classifier . They are as follows:-

5.1) ENABLE DATA AUGMENTATION AND SET PRECOMPUTE =TRUE.

For consistency purpose lets resize the data . get_data() has normal couple of lines of code . The first one being Setup data augmentationand the other one is Format your data and we pass image_size and batch_size to the function. When we start working on new dataset everything should go super fast if we go for small images first. Hence we started of with size sz=224 and bs=64 . And later we can increase the size . If on increasing the size we see Cuda Out of Memory error , go restart the kernel and set the batch size to something smaller and run it again.

Now lets set up the Neural Network with precompute=True:-

When declaring the architecture using ConvLearner.pretrained(…) , the precompute is set as True which says it to implement the activations from the pretrained network. A pretrained network is one which has already learnt to recognize certain things. For our Dog breed identification case study , the pretrained network (RESNEXT101_64 )used has already learned to classify 1000 classes on 1.2 million images in ImageNet Dataset. So take the penultimate layer (as this is the layer which has all the required information necessary to figure out what the image is ) and save these activations. Convolutional neural network have these things called “activations.” Activations are the rich features. An activation is a number that says “this feature is in this place with this level of confidence (probability)”. Save these activations for each of the image and these are known as precomputed activations. Now when creating a new classifier , take advantage of these precomputed activation and quickly train a model based on those activations. Hence to implement this set precompute=True .

The pretrained method creates our new Neural Network from the arch model . At the same time it does two things as follows:-

  • What it does is , it keeps all the layers except the last layer (The last layer is the output layer which gives probabilities within 1000 classes in case of Imagenet).
  • The last layer is replaced by adding few layers that end with an output layer which gives probabilities for all the 120 classes of dog breeds.

So Initially everything is frozen and precompute=True ,hence all we are learning is the layers we have added. As with precompute=True 
data augmentation doesn’t do anything because we are showing exactly the same activation each time. What precompute=Truedoes is it pre-calculates how much does the image has something that looks like the activation. Precomputed activations are the output of activation functions used in each of the frozen layers that we don’t intend to train . This helps us in speeding up the training of the newly added Fully Connected layers at the end. We only precompute the activations of the penultimate layer of the network . Doing on all the layers is storage intensive.

5.2) USE lr_find() TO FIND THE HIGHEST LEARNING RATE WHERE LOSS IS STILL CLEARLY IMPROVING

The below command helps in finding the best learning rate .

This command results in the graph as shown below which shows the fact that the learning rate is being increased with increase in number of iteration .

This command plots loss vs learning rate and gives the result which shows that as we increase the learning rate the loss comes to a minimum and then there is a point, after which it overshoot the minimum point , hence the loss becomes more. So we have to pick up a learning rate which corresponds to minimum loss . But the learning rate at this point has already been too high so we go back one step from the minimum loss point on learning rate scale and pick that as best learning rate . Here its 0.01.

5.3) TRAIN LAST LAYER FROM PRECOMPUTED ACTIVATIONS FOR COUPLE OF EPOCHS.

Here we choose the best learning rate (0.01) and train the last layer of the NN(since the precompute =True and first layers frozen) i.e their weights will be updated in order to minimize the loss of the model. Using this technique we reached an accuracy of 92% . But there is little bit of overfitting as our validation loss is more than training loss. To avoid that we introduce Dropout ,Data Augmentation and training on larger images.

5.4) TRAIN LAST LAYER WITH DATA AUGMENTATION ON (i.e PRECOMPUTE=FALSE) FOR 2–3 EPOCH WITH CYCLE_LEN=1.

  • DROPOUT

The ps is the dropout parameter . It refers to dropping out 50% of the neurons at random . This helps the Neural Network to prevent over learning and hence prevent overfitting . As we can see at this point of time our accuracy has dropped a bit to 91.6% but the good news is our validation loss is less than training loss which is a clear sign that we aren’t overfitting.

  • DATA AUGMENTATION

To further improve the model , we need more data hence turn on the Data augmentation by setting learn.precompute=False . By setting precompute=False, we are still only training the layers that we have added towards the end because its frozen but data augmentation is now working because its actually going through and recalculating all of the activations from scratch. The concept of cycle_len has been discussed in detail in previous episode .

Post data augmentation we see an increase in accuracy to 92.17% without any overfitting .

Note:- An epoch is one pass through the data and a cycle is number of epochs present in a cycle . So here cycle is basically the same as epoch.

  • TRAINING ON LARGER IMAGES IS THE BEST WAY TO PREVENT OVERFITTING.

Now we see the case of underfitting , as Validation loss is much much lower than the training loss . Our main aim should be to keep the val_loss and trn_loss as close as possible and at the same time keep an eye on accuracy .

Cycle_len=1 may be too short. Let’s set cycle_mult=2 to find better parameter. This will help prevent underfitting. When we are under fitting, it means cycle_len=1 is too short (learning rate is getting reset before it had the chance to zoom in properly and choose the best parameter). The concept of cycle_mult has been discussed here in detail.

Lets train it for couple of more epochs.

!!! But wait !!!

In case you give it a closer look to the output , our val_loss is still slightly higher than trn_loss . Does that mean its overfitting and its bad for our Neural Network . Lets go for expert opinion on why a little bit of overfitting is okay. Check out the link below.

Besides that , this dataset is similar to ImageNet dataset. So training convolution layers doesn’t help much. Hence we are not going to unfreeze all the layers. Finally we are doing the TTA(Test Time Augmentation) and getting the prediction probabilities .

In the last step we are calculating the accuracy and the loss. Its good to see that our model is getting 93.3% accuracy on the test dataset which is simply Mind Blowing .

And That’s how we get State Of the Art Result my friend .

P.S. -This blog post will be updated and improved as I further continue with other lessons. In case you are interested for the source code check it out here .

A B CAlways be clapping . 👏 👏👏👏👏😃😃😃😃😃😃😃😃😃👏 👏👏👏👏 👏

If you have any questions, feel free to reach out on the fast.ai forums or on Twitter:@ashiskumarpanda

Source: Deep Learning on Medium