Source: Deep Learning on Medium
End to End solution of detecting kin-relationship from Faces.
This blog will step by step guide you on how to build a Deep Neural Network model with Keras from scratch to finally deploying it to the web using Flask. This problem was a competition hosted by Kagge and can be found here.
Table of contents:
1- Defining our goal
2- Data and its Description
3- Building Model
5- Productionize the Model
6- The Video
7- To-Do Tasks
Defining our goal:
Do you have your father’s nose sitting on you?
Blood relatives often share facial features. Now researchers at Northeastern University want to improve their algorithm for facial image classification to bridge the gap between research and other familial markers like DNA results.
This technology remains largely unseen in practice for a couple of reasons:
1. Existing image databases for kinship recognition tasks aren’t large enough to capture and reflect the true data distributions of the families of the world.
2. Many hidden factors affect familial facial relationships, so a more discriminant model is needed than the computer vision algorithms used most often for higher-level categorizations (e.g. facial recognition or object classification).
So, we will be building a complex model by determining if two people are blood-related or not based solely on images of their faces.
We will be using data given by Families In the Wild (FIW), the largest and most comprehensive image database for automatic kinship recognition.
FIW’s dataset is obtained from publicly available images from celebrities. For more information about their labeling process, please visit their database page.
The folder ‘train’ consists of subfolders of families with names (
F0123), then these family folder contains subfolders for individuals (
MIDx). Images in the same
MIDx folder belong to the same person. Images in the same
F0123 folder belong to the same family.
Train Folder is shown below:
Each subfolder of ‘train’ is shown below:
Each folder of individuals contains faces of that person:
The folder ‘test’ contains images of faces that need to be tested with some another random image to be kin related or not.
The file ‘train_relationships.csv’ shown below contains training labels. Remember, not every individual in a family shares a kinship relationship. For example, a mother and father are kin to their children, but not to each other.
Setting up the required libraries to use when needed:
Note: Importing the library ‘keras_vggface’ might give an error if it does not exist on your system. To download it you can refer/run/use below code:
!pip install git+https://github.com/rcmalli/keras-vggface.git
Diving into the data folders and analyzing the train_relationship.csv file, I found some hiccups. Ex: In the train_relationship.csv file there is a relation between ‘F0039/MID1’ and ‘F0039/MID3’, but there is no such folder for ‘F0039/MID3’ in the train folder.
I can see some similar issues because of the absence of the following folders
… and more.
One of the simple solutions to the above problem is to ignore these empty directories and only consider the ones which are available to us.
Loading the data and splitting it into the train & validation set.
‘val_images’ contains folders of the families with folder names starting ‘F09’ while ‘train_images’ contains all other family folders.
Now, these two folders also contain the empty directories, the problem that we discussed above. Time to ignore them:
We have ‘train’ & ‘val’ which contain the family folders for the training and validation process respectively.
After defining our goal for this problem we explored and understood the nature of data we have and we also overcome a simple problem we faced. Now is the time to do some modeling. Recall the goal we set for ourselves at the beginning:
“Predict, given two faces are kin related or not.”
So basically we have a classification problem at hand to be solved.
Deep Learning Model:
With the use of deep learning architecture, the task of face recognition showed highly improved accuracy as compared to previous classical methods. The state of the art models can now outperform even humans when trained with huge datasets.
For our problem, we will be using two different architectures that achieved state-of-the-art results on a range of face recognition benchmark datasets. Both these systems can be used to extract high-quality features from faces, called face embeddings, that can then be used to compare two faces that are similar or not.
1- Facenet: It is a face recognition system developed in 2015 by researchers at Google. It takes an image as input and predicts a 128-dimensional vector or face embedding. So in simple terms, this vector/face embedding now represents that input face in numbers.
2- VGGFace: It is developed by researchers of one of the most prominent groups when it comes to image processing, Visual Geometry Group at Oxford. It takes an image as input and predicts a 2048-dimensional vector or face embedding.
These two models will work as our base models i.e we will pass our pair of input images to both these models and get face embeddings representing input faces. It will be clear once we build the actual model below.
We have two face images, image_1 and image_2 and both our base models will take each of these images. Facenet will take image_1 and image_2 as input_1 and input_2. VGGFace will take image_1 and image_2 as input_3 and input_4.
Note: Input sizes of images are different for both the different models.
After passing the input images through both the base models we will get face embeddings for both the images. We have
x1- face embedding for image 1 from Facenet model
x2- face embedding for image 2 from Facenet model
x3- face embedding for image 1 from VGGFace model
x4- face embedding for image 2 from VGGFace model
We can directly use these embeddings for our classification task by passing them through Dense FC layers but instead of doing that it will be a good feature engineering trick to combine or merge these embeddings for better results.
Ex: Squaring the vector x1 may give more information about image_1.
Like this, we can use many different combinations like adding (x1,x2) multiplying (x3,x4), etc. The code for this is shown below:
Finally concatenating all the new features we created till now, we will pass it through some dense FC layers to perform the binary classification. Our whole model architecture will look like this:
We have our model architecture ready. The next step is to start training it with some loss and optimizer. Before training the model, we need to define some helper functions which will help us in training and inference step.
‘read_img_fn’ will take the path of the input image and return the same image with a predefined size. Remember, we are using different input sizes of images for different base models. Like that, another function ‘read_img_vgg’ will do the same for the VGGFace model.
A special helper function which is ‘generate’ is used to generate batches of pair of images with some fixed bath_size. For every batch, it will return the combination of four images (each pair for input to both the models) and the labels.
Loss function and Model Configuration:
As we discussed at the start, it is a binary classification problem. We’ll be using binary cross-entropy or logloss as the loss function for this problem.
To configure the model, we will be using accuracy as the tool to keep track of performance while training and optimizer will be Adam with learning rate = 1e-5.
Finally, we are ready to train the model that we defined above. Using callbacks to store and use the trained model at various points, the final code of training looks like this.
You can see here we are using the ‘generate’ helper function to generate batches of images of size 16. After training for several epochs the model converged and the accuracy was not improving from there.
Finally, the trained model is saved with filename ‘facenet_vgg.h5′.
Now we can use this trained model to predict the probabilities given two input images. We will be using the ‘sample_submission.csv’ file provided by Kaggle to do so. This file contains pairs of images and the model needs to predict the probability of kin relationship.
This file ‘face_vgg.csv’ is then submitted to the Kaggle to check the score. Our model performed well and gave the AUC score of ‘0.887’ on private lb and ‘0.881’ on public lb.
From this, we can say that our complex deep learning model is performing well at this task and is ready to go into production.
Path to Production:
We will be using Python-based micro web framework Flask to make a simple web server on the localhost which will act as an API and will help end-users to communicate to our trained model. So that now anyone can access our model.
First, let’s build a simple web API using Flask to demonstrate how things work.
We will need two different files to make.
1- app.py — The Flask script for backend
from flask import Flask, request, render_template
app = Flask(__name__)@app.route('/')
The above code will import the necessary files and initiate the Flask instance. Function hello_world() will render our home page to the user.
2- index.html — The home page html file which the user will see at the frontend.
Now we have both the two files which we need. After running the file app.py, the user will be able to see the page saying ‘Hello world’.
Before building let’s first visualize the whole structure of how things will work. The user will see a web-page requesting him/her to upload two image files that he/she needs to see if they are kin related or not. At the backend, our Flask script will access the two images and run the required operation on it. It will take the images, perform preprocessing and then it will pass them to our trained model. Finally, the model will predict the probability and will be projected at the frontend to the user.
Now we’ll go and build our kin-Prediction API.
This is how our project layout looks like —
└── end.html ├── static
The project folder contains all the files that our Flask needs.
Folder templates will contain the HTML files for frontend which will be used by Flask to render when needed.
Folder static will contain the images that the user will upload for prediction.
‘ facenet_vgg.h5 ’ is our trained model which needs to be saved in the directory so that Flask can directly use it.
app.py file is the main Flask script that will be running and doing all the backend operations.
The below function from app.py will render the home page for the user.
The index.html file looks like this:
<body style="background-color:gray;" >
<form action = "http://localhost:5000/upload" method = "POST"
enctype = "multipart/form-data" align="center">
<input type = "file" name = "file1" /></label>
<input type = "file" name = "file2" /></label>
<input type = "submit" value="Predict"/> </label>
The page will look like this:
After choosing the images and clicking on the Predict button, both the images will be uploaded and saved to our local directory folder which is ‘static’. From there Flask will access the images and perform the required actions and finally predict the probability.
The code for this in app.py file looks like this:
Finally, in the end, the flask will render the template ‘end.html’ with both the images and prediction.
<h1 style="font-family:verdana;"> Probablity of two persons having kin relation is </h1>
That’s it. Our model is ready and running on the localhost. You can download and refer to the full code here.
I tried this, first with images of me and my Dad and then with images of Mom and Dad. In both cases, the Model works perfectly fine and predicts high probability (kin-related) and low probability (Not kin-related) respectively.
Here is the video which showcases the above experiment.
1- Instead of just running it on a local server, we can deploy it on cloud-based web services like Heroku, so that anyone from anywhere can access your model.
2- Currently our model takes images of only faces of the people as it is trained on only faces. It will not work if given a full image of person. We can build a different solution which takes care of this problem.