Build an Image classifier using 50 lines of code !!!

Source: Deep Learning on Medium

Yes !!! Its a Namo-RaGa Classifier :)

From the very first day of my machine learning, I was impressed by the way image classifiers works. Its an exciting field where, computers have the intelligence to see and identify things like human do. Today, I will be talking about creating a custom image classifier.

Many people talk like, image classifiers is not our cup of tea and its with the hands of internet giants, who are having access to millions of images. The argument is half correct and at the same time exaggerated. Its not an easy task to build an image classifier for high resolution images using our limited set of training images from scratch. But we can use a technique called transfer learning to do the same.

By the way, What is this transfer learning ???

Transfer learning is the method where we reuse already trained model to customize it for your purpose. Its not practical to achieve high accuracy using limited amount of training data that we have. We can just tilt an already learned model to work for our specific task.

Working Logic

In a typical convolution neural network, the concept is to gather minute localized details from the initial layers, then going on, extract the generalized details of the image. In the first layers, the kernel size will be small, with few channels and convolution outputs will be large sized matrix. The initial kernels will be focused on features like lines, edges etc.

Later On in the layers, number of channels and channel width of kernels keeps on increasing, and feature size will shrink due to max pooling layers. At the very last layers, we have a feature vector extracted, where each neuron output corresponds to a high level feature. The feature vector is then applied to a fully connected neural network and a softmax layer to distingush between the class labels. In short, the whole convolution concept is to extract valuable features from the images and following fully connected layers to do the classification.

Lets come back to transfer learning :)

Our pretrained model neurons can grab almost all major features of the images. For example a well trained model’s final layers can identify features like nose, ear, eye, flowers, etc and a well weighted sum of the features is capable of distinguishing between different class labels.

In a pretrained CNN, typically they can classify a set of objects in the final softmax layer. For a custom image classifier, we have to remove the final softmax layer and add a new softmax layer with number of class labels that we require. Thats all !!!!!!!

Our CNN model is almost done. Now what is pending is training. During training, when we apply backpropogation, dont update weights of the pretrained layers and just adjust the last layer weights. So Basically, It is similar to a neural layer attached to the end of pretrained model. The final layer weights will be updated during backpropogation, such that the softmax output can classify our class labels.

Code Walk Through

In the below code, we will try to classify between two indian political kingmakers, Narendra Modi and Rahul Gandhi. In the below snippet, we import our required libraries for the task. We are using Resnet50 as our pretrained model here. Keras supports almost 9 pretrained models with available weights. Resnet50 model inputs (224,224,3) pixel image and have 1000 class labels in the final softmax layer(like cat, dog ..). Below given is the architecture of Resnet50 model.

import keras
from keras.applications import ResNet50
from keras.models import Sequential
from keras.layers import Dense, Flatten, GlobalAveragePooling2D

As we have to classify between two labels, we can set the number of classes as two below. Resnet50 weight file will be an h5 file which is available open source. Using Sequential class, we can create and attach resnet weight to the model. Include_top flag is set to false, so that the final softmax layer is removed. Later, we add our own softmax layer with two class labels at the end.

trainable is set to False, so that, during training, resnet layer weights are not altered.

num_classes = 2
resnet_weights_path = '../input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

my_new_model = Sequential()
my_new_model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))
my_new_model.add(Dense(num_classes, activation='softmax'))

# Say not to train first layer (ResNet) model. It is already trained
my_new_model.layers[0].trainable = False

Now, its time to compile the model. Below, we are using stochastic Gradient descent, with categorical cross entropy for the model. We can use accuracy as the metrics.

my_new_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

ImageDataGenerator is a class used here to augment the training data. It uses image augmentation techniques like rotations/zoom/shifts to duplicate images can be created from the existing images. Finally, Image sizes has to be altered to (224,224) pixels. training and testing images are also generated.

from keras.applications.resnet50 import preprocess_input
from keras.preprocessing.image import ImageDataGenerator

image_size = 224
data_generator = ImageDataGenerator(preprocessing_function=preprocess_input)


train_generator = data_generator.flow_from_directory(
'../input/photos/photos/train/',
target_size=(image_size, image_size),
batch_size=12,
class_mode='categorical')

validation_generator = data_generator.flow_from_directory(
'../input/photos/photos/val/',
target_size=(image_size, image_size),
class_mode='categorical')

Next, we have to train the model. train_generator output will be used as the training data here. The whole dataset is trained for 3 epochs (epochs have to be tuned by checking validation loss).

my_new_model.fit_generator(
train_generator,
epochs = 3,
validation_data=validation_generator,
validation_steps=1,
)

Below are the accuracy reports on the validation data after each epochs. It was able to get nearly 91% accuracy after 3 epochs. Keep in mind that i just used around 50 images for the whole training and 12 images for validation.

Lets pick some unseen images of both and check how good is our classifier in distinguishing Rahul Gandhi and Narendra Modi. And finally the results are as below :)

Conclusion

I hope you all had a good time having an idea on image classifiers. In Machine learning, its all about getting the best tuned weights for our problem. So, Transfer learning is the most powerful technique when we have limited amount of training data. Transfer learning considerably lowers the amount of computation,effort and also enhances the efficiency. Transfer Learning can be applied over different deep learning frameworks also, but Computer Vision is the most used Field.

Thanks You all :)