Guinea Pig Breed Classification

Source: Deep Learning on Medium

Model Building

For the rest of this article, you can follow me along with this Google Colab notebook. It is an abridged version of the steps I took to complete the project.

All you have to do is to make a new subfolder called ‘cavy_breed_clf’ under your Google Drive’s ‘Colab Notebooks’ folder. Copy the following notebook and folders into ‘cavy_breed_clf’ and you should be good to go.

However, if you would like to see the full picture of how I got there, I have also sorted my scripts and notebooks in numerical order. I hope this will provide a clearer sense of the following processes: data preparation, training (classical ML, then Neural Network) and metrics/predictions.

The following were the steps I took to build a working model capable of recognizing cavy breeds reasonably well.

  • Step 0. Project Folder Formatting
  • Step 1. Raw Image Scraping
  • Step 2. Dataset Creation
  • Step 3. Image Data Preprocessing
  • Step 4. Baseline Model Training
  • Step 5. Deep Learning Model Training
  • Step 6. Model Comparison and Selection

Step 0. Project Folder Formatting

This was the first project where I made the effort to organize my codes and documentation based on suggestions from here and here.

  • /data — project datasets: raw, training, validation and testing
  • /images — raw images are stored in sub-folders ordered by breed, plus a sample sub-folder for adhoc images beyond the datasets
  • /lib/data_common.py — image data preprocessing routines
  • /lib/ml_common.py — classical ML handling processes, including classification metrics
  • /lib/nn_common.py — neural network DL handling processes

Step 1. Raw Image Scraping

Having a lot of good, clean and the right type of data is very important. It can be the most painful part, but also the most crucial part of your ML project. It will ensure your model will perform what it is set out to do, or at least a presentable MVP. During the bootcamp, we truly learnt this lesson the hard way.

AFAIK, there were no public cavy image dataset available for machine learning. Fortunately, there are many useful search engines out there based on this article. For me, I relied mostly on Google, Yahoo and Bing.

I was able to scrape 500 to 1000 raw images for each breed to start off with. I depended heavily on this wonderful Chrome extension Fatkun Batch Download Image to batch download the images.

Mind you, 1000 images per breed might sound a lot, but it was not. These were just raw and unlabeled. I had to pore through all 4000+ images during and post downloading to filter out unqualified candidates:

  • Mislabeled animal species or cavy breed
  • Missing cavy in the image (yes, even search engines are not perfect)
  • Multiple cavies (better for the model to learn from just one per image)
  • Caricatures, instead of a real photo
  • Important breed features not in full display (eg. rosettes on Abyssinian)
  • Resolution was too low
  • Too out of focus

Eventually, I trimmed it down to 1600+ images:

  • Abyssinian: 553
  • American: 519
  • Silkie: 267
  • Skinny: 292

It was a moderately unbalanced dataset, but I believed it should not impact the training of the models significantly.

Step 2. Dataset Creation

After I have collected the required images, I sorted these into the respective sub-folders. Then, I ran a short script (also as a notebook) to generate these datasets; raw, training, validation and testing. The latter three datasets would become the input data for all future ML training, validation and testing.

A quick sanity check to ensure the breeds were distributed evenly across the datasets.

Another check to be sure the features and targets were in good order.

Step 3. Image Data Preprocessing

Each image file will be transformed into ML ready format by resizing it to a specific image shape (150, 150) and then flatten it to an 1-D array. These will be saved as the features (X_train, X_val, X_test).

The breed of each image will be one-hot-encoded to become the labels (y_train, y_val, y_test).

A quick visual sanity check to make sure it was all good:

Step 4. Baseline Model Training

I decided right from the start that ‘Random Forest’ classifier would be my baseline model. Nevertheless, I took this as an learning opportunity to see how ‘well’ other classical classifiers would fare against my baseline model.

I created a routine (Vanilla_ML_Run) to train these four classifiers together:

Once the trained classifiers had been saved/reloaded, I compared the performances with the following:

NB: I highly recommend reading this and this for a good grasp of PR and ROC curves. And also this on Micro-average vs Macro-average.

Step 4a. Gaussian NB Classifier

Needless to say, Dummy classifier did not do very well and apparently, so was GaussianNB classifier. Precision and recall scores for ‘Silkie’ and ‘Skinny’ were pretty low. Let’s take a look at ‘Skinny’ as an example:

  • Low precision here meant that many of its predictions were not ‘Skinny’ (false positives)
  • Low recall meant that many actual ‘Skinny’ were wrongly identified as other breeds (false negatives)

And the confusion matrix confirmed that understanding.

Step 4b. Logistic Regression Classifier

This classifier performed much better than GaussianNB, where most precision and recall scores hovered above or near 50.

However, the PR curve seemed to tell a clearer story that this model did not perform very well for ‘Skinny’ especially. Its area was way less ideal than others and its curve deviated furthest from the micro-average curve too.

Step 4c. Random Forest Classifier

Random Forest appeared to be a better model than Logistic Regression, albeit just slightly. It looked like the overall higher scores were achieved at the expense of low recall scores for ‘Silkie’ and ‘Skinny’.

The F1 scores and the PR curve indicated strongly that this model did not perform very well for the smaller classes (‘Silkie’ and ‘Skinny).

Step 5. Deep Learning Model Training

Now that we had our baseline model trained and reviewed, we could proceed next with two different DL models, namely:

  • Multi-layer Image CNN (Convolution Neural Network)
  • Transfer Learning from Inception V3 using pre-trained weights on ImageNet

Once trained, I would compare the model performances with the same metrics as before.

Step 5a. Multi-layer Image CNN Building

The architecture was fairly straightforward for an Image CNN model, of which you could find out more about it from here.

After training for 100 epochs, the model achieved a training accuracy of 78% and validation accuracy of 67%. It definitely was more accurate than the baseline model, but I believed it was already overfitting.

Let’s look at the classification report. The overall scores were convincingly better as compared to the baseline model. Surprisingly, ‘Skinny’ was the best performer against the rest. I didn’t expect that!

The precision and recall scores for the other breeds were interestingly uneven. The same with the PR and ROC curves too. ‘Skinny’ stood out very clearly against the other classes. It sure was cool to have trained a model capable of telling apart a hairless cavy from those with hair. 🙂

Step 5b. Transfer Learning from InceptionV3 (ImageNet)

This model is built using transfer learning technique as illustrated here. It uses weights pretrained on ImageNet. For this specific model, I chose to freeze the first 230 layers.

After training for 100 epochs, the model achieved a 99 % training accuracy and 82% validation accuracy. It was definitely overfitting and the choppy, fluctuating validation curve seemed to support that as well. It could also mean that my data was still not enough.

When I looked at the classification report, precision and recall scores were extremely good overall, except for three very specific ones. It seemed like the model tended to mislabel ‘Abyssinian’ and ‘Silkie’ as ‘American’.

The confusion matrix also supported this observation. The model was mislabeling enthusiastically ‘Abyssinian’ (31) and ‘Silkie’ (9) as ‘American’.

Intuitively, this made some sense. Unlike ‘Skinny’, ‘Abyssinian’ being short-haired and ‘Silkie’ being smooth-coated, both breeds could be mistaken easily for an ‘American’ when viewed from certain angles.

NB: Of course, it could also be a simple case of badly labeled raw images. Something I would definitely want to look into to be sure.

Step 6. Model Comparison and Selection

The InceptionV3 transfer learning model had the best F1 score of 84. When we compared the PR curve of all the models together, it became clear that we had a true winner.

Similarly, the transfer learning model led with the best ROC curve too.

Hence, the final model trained through transfer learning was clearly the best candidate. Just for fun, I even used it to predict out of scope images kept in the subfolder ‘/images/new_samples’.