Source: Deep Learning on Medium
Convolutional Neural Networks (CNN) are pretty powerful Neural Network architectures when it comes to Image Classification tasks. However, its less known that they are equally capable of performing Image Regression tasks.
The basic difference between Image Classification and Image Regression tasks is that target variable (the thing we are trying to predict) in Classification task is not continuous while in Regression task it is continuous. For example, if we need to classify between different dogs and cats breeds, it will fall under the scope of Classification task. However, if we need to predict house prices based on the images of houses, it will be a Regression task.
I work in the Life Insurance industry and one of the major changes which is happening these days across the globe in this industry is to simplify the onboarding of customers. Different companies are trying to make the process of selling insurance simpler, intuitive and hassle-free. In this direction, techniques of Deep Learning can be pretty useful to solve the problem. For example, if the Deep Learning model can predict the age, gender, smoker status and BMI of a person (which are few of the most crucial factors to calculate right amounts of premium for a given coverage amount or decline / postpone the coverage), insurance companies can simplify the insurance selling process to a great deal for their customers and possibly increase the sales.
We know that prediction of gender based on image of the person is relatively simpler and this falls under Image Classification task.
In case you are interested, you can refer to full workings at below Kaggle Kernel:
On the other hand, Image Regression task such as predicting age of the person based on the image is relatively difficult task to accomplish. The easier way to handle this task is to make it a classification task by grouping different ages in a bucket i.e. by creating age bands. However, this does not solve the purpose as far as insurance policy selling is concerned (normally mortality or morbidity rates differ significantly by ages and gender). So, I attempted to create a model which tries to predict the exact age of the person.
Problem of Bias and Selection of Data
The main problem working with images especially images of persons is that the majority of data sources available freely in public domain are significantly racially biased. Jeremy Howard of Fastai touches upon this point in one of his lectures on Deep Learning course. These images available in public domain are biased in the sense that most of the images captured are of white people and if we create any model based on these images, the chances are it won’t do well on images of let’s say Indian or Asian origin people.
In order to solve this issue to a certain extent, I carefully selected three sources of data (all are available in public domain):
- IMDB-Wiki Face Dataset ( https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/)
- UTK Face Dataset ( http://aicip.eecs.utk.edu/wiki/UTKFace)
- Appa Real Face Dataset ( http://chalearnlap.cvc.uab.es/dataset/26/description/)
These datasets have been created using different sources / images across all ages between 0 and 100 and and if we combine these data together, the problem of the acute racial bias is somehow eliminated to a certain extent.
Fastai Course and Library
Before we start the main content of this article, I would like to mention few words and extend my token of appreciation for the extraordinary course “Practical Deep Learning for Coders (Part 1 and Part 2)” taught by my favorite teacher Jeremy Howard. It is an amazing course and I have learnt so many things and can’t thank enough to Jeremy and Rachel for creating and supporting this course.
Besides the course, Fastai has an astounding and thriving community of students, researchers and practitioners who are always ready to help and support fellow students and practitioners. Just recently, I got to know that two of my past Deep Learning projects on which I wrote Medium articles as well were selected to be part of top 100 such projects by Fastai users to be part of the course on Deep Learning in University of Brasilia. Extremely product moment for a person who has just started to learn the basics of ML / DL. How cool is that! All of this became possible because of this amazing course, awesome teacher like Jeremy Howard and thriving community of Fastai users and the library which is a wrapper library around Pytorch.
Due to some strange reason, Kaggle Kernel on which I worked for this project did not get successfully committed. So, I put the notebook on my GitHub:
Here are few main points which should be highlighted:
- Fastai v1 model was used and CNN architecture — ResNet34 was chosen to run the model. I tried using more complex architectures such as ResNet50 but the validation errors were found to be higher.
- In this working notebook, I have used Image Resizing technique in which image sizes were gradually increased which helped in getting higher accuracy. This is such a great technique and must be used every-time we need to work on CNN.
- L1 Smooth Loss (Huber’s loss) was used which behaves better than L1 or L2 losses.
- During the project, I learnt using Discriminative Learning Techniques of Fastai in which we can split the NN arch into different parts and assign different values of Weight Decays and Learning Rates for different parts of the NN arch.
- In last, using Fastai Pytorch Hooks and Spotify Annoy was used to create an Image Similarity Model (which did not work very well in my view.
A glance at the Age Model:
layers = list(models.resnet34(pretrained=True).children())[:-2]
layers += [AdaptiveConcatPool2d(), Flatten()]
layers += [nn.BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)]
layers += [nn.Dropout(p=0.50)]
layers += [nn.Linear(1024, 512, bias=True), nn.ReLU(inplace=True)]
layers += [nn.BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)]
layers += [nn.Dropout(p=0.50)]
layers += [nn.Linear(512, 16, bias=True), nn.ReLU(inplace=True)]
layers += [nn.Linear(16,1)]
self.agemodel = nn.Sequential(*layers) def forward(self, x):
Here, you can see that in the architecture of ResNet34, after removing the layer which deals with Classification task, we added the portion which can deal with Regression task.
A glance at the Loss Function:
def forward(self, input:Tensor, target:Tensor) -> Rank0Tensor:
return super().forward(input.view(-1), target.view(-1))
Smooth L1 loss is used which behaves better than L1 or L2 loss.
A glance at the Learner:
learn = Learner(data_wiki_small, model, model_dir = "/temp/model/", opt_func=opt_func, bn_wd=False, metrics=root_mean_squared_error,
learn.loss_func = L1LossFlat()
Now, we will see few predictions of the model:
To test this model against a random test image, I used Indian PM Modi’s pic which was taken in year 2015 (when he was around 64 years of age). Let’s see how do we fare:
Let’s see what does the model predict:
It was one of the longest project I ever got involved but I must say I learnt a lot during the process. Few of these things are discriminative learning techniques, the method to construct a suitable model by restructuring it, image resizing techniques etc.
If you like my work and this article in particular, a clap would be wonderful.