Transfer Learning-Rock Paper Scissors Classifier

Original article was published by Farhan Rahman on Deep Learning on Medium


Using existing models for image classification

Transfer Learning-Rock Paper Scissors Classifier

How to use transfer learning for classifying images

Photo by Marcus Wallis on Unsplash

Growing up building things using Lego has always been fun, so is building machine learning algorithm from scratch. Usually machine learning algorithms are sufficient for various applications but when it comes to huge data size and classifying images we need more powerful machine learning algorithms hence deep learning comes into picture. Building an algorithm is always beneficial but time consuming so why not use existing algorithms and model for similar type of data. The process of using the stored knowledge which is gained while solving one problem and applying it to a different but similar problem is called Transfer Learning. Let’s get a better picture of how we can use some really powerful convolutional neural network on our own data set.

Import dependencies

As usual before starting any machine learning problem we need to import the dependencies and the libraries — just laying the foundation to build our entire model on.

Import data

The data we will be using are computer generated images of hands showing the different pose for rock paper scissors. The “rock paper scissors” dataset is available directly from the Tensorflow package. In the cells that follow, we’ll get the data, plot a few examples, and also do some pre-processing.

To know how our data set looks like use the following cell.

Pre-process dataset

Even though the images for our analysis are cropped but we still need to scale the images and pre-process it before they can be used as suitable input for our models. As the existing models have few constrains about the size of the input image we need to reshape our data set . For that we will use the following code block.

We’ll convert to numpy format again:

Upload custom test sample

Now we can use custom input for classification using the following cell.

Classify with MobileNetV2

Keras Applications are pre-trained models with saved weights, that we can download and use without any additional training.

Here’s a table of the models available as Keras Applications. In this table, the top-1 and top-5 accuracy refer to the model’s performance on the ImageNet validation dataset, and depth is the depth of the network including activation layers, batch normalization layers, etc.

(A variety of other models is available from other sources — for example, the Tensorflow Hub.)

For our analysis I used MobileNetV2, which is designed specifically to be small and fast (so it can run on mobile devices!). MobileNets come in various sizes controlled by a multiplier for the depth (number of features), and trained for various sizes of input images. We will use the 224×224 input image size.

Let’s see what the top 5 predicted classes are for my test image are:

MobileNetV2 is trained on a specific task: classifying the images in the ImageNet dataset by selecting the most appropriate of 1000 class labels.It is not trained for our specific task: classifying an image of a hand as rock, paper, or scissors. As a result we get an abrupt classification which is nowhere close to our desired result. So we need to fine tune our base model to get predictions in our desired range.

Background: fine-tuning a model

When we talk about convolutional neural network we take into considerations of a lot of layers between to input and the output and a typical convolutional neural network looks something like this:

We have a sequence of convolutional layers followed by pooling layers. These layers are feature extractors that “learn” key features of our input images.

Then, we have one or more fully connected layers followed by a fully connected layer with a softmax activation function. This part of the network is for classification.

The key idea behind transfer learning is that the feature extractor part of the network can be re-used across different tasks and different domains.

This is especially useful when we don’t have a lot of task-specific data. We can get a pre-trained feature extractor trained on a lot of data from another task, then train the classifier on task-specific data.

The general process is:

  • Get a pre-trained model, without the classification layer.
  • Freeze the base model.
  • Add a classification layer.
  • Train the model (only the weights in your classification layer will be updated).
  • (Optional) Un-freeze some of the last layers in our base model.
  • (Optional) Train the model again, with a smaller learning rate.

Train our own classification head

This time, we will get the MobileNetV2 model without the fully connected layer at the top of the network.

Then, we will freeze the model. We’re not going to train the MobileNetV2 part of the model, we’re just going to use it to extract features from the images.

We’ll make a new model out of the “headless” already-fitted MobileNetV2, with a brand-new, totally untrained classification head on top:

We’ll compile the model and use data augmentation as well:

Now we can start training our model. Remember, we are only updating the weights in the classification head.

To visualize the performance of our model use the following code:

Fine-tune model

We have fitted our own classification head, but there’s one more step we can attempt to customize the model for our particular application.

We are going to “un-freeze” the later parts of the model, and train it for a few more epochs on our data, so that it is better suited for our specific classification task.

Note that we are not creating a new model. We’re just going to continue training the model we already started training.

Classify custom test sample

Conclusion

In practice, for most machine learning problems, we wouldn’t design or train a convolutional neural network from scratch — we would use an existing model that suits our needs (does well on ImageNet, size is right) and fine-tune it on our own data.

(In the most extreme cases, we may attempt few-shot, one-shot, or zero-shot learning.)

Transfer learning isn’t only for image classification.

There are many problems that can be solved by taking a VERY LARGE task-generic “feature detection” model trained on a LOT of data, and fine-tuning it on a small custom dataset.

Application of transfer learning is endless and it has seen an upsurge in its application since the advent of powerful GPU’s and computational hardware.