Source: Deep Learning on Medium
This post describes a neural network built to predict the BMI (body mass index) of a person from an image of their face.This project borrows from another project made to classify the age and gender of a person using the input of their face, including the weights of a trained model and a script used to dynamically detect a user’s face with their webcam. In addition to being an interesting machine learning problem, predicting BMI in this way could be a useful tool in medical diagnostics.
The training data used were 4000 images, each of a different individual, taken from the front of the subject. The BMI of each training sample was calculated from the subject’s height and weight (BMI is weight in kg divided by the squared height in meters). Although the training images can’t be shared here since they are being used in another private project, this type of data can be scraped from various places online.
To normalize the images before training, each image was cropped to the subject’s face, excluding the area surrounding the face. The Python library dlib was used to detect the subject’s face in each image and an extra margin was added around this border detected by dlib to produce the actual images used in training. Several margins were experimented with to see which would allow the network to perform best and a margin of 20%, expanding the image by 40% in height and width (20% on each side), was chosen since it resulted in the best validation performance.
Shown below are the different margins added to a cropped image of Bill Murray (not part of the training data), along with a table showing the lowest mean absolute error (MAE) the model achieved on the validation set while training on images with each of the different added margins.
Although the MAE values in the 20%-50% margin range are probably too close to each other to say that any one is better than the others, it is evident that an added margin of at least 20% results in a better MAE than if no margin is added. This might be because the added margin captures features like the upper forehead, the ears, and the neck, which are useful to the model in predicting BMI but are largely cropped out by the original dlib border.
Code for Image Preprocessing:
To increase the number of times each original training image could be used to train the network, the images were augmented every training epoch. The image augmentation library Augmentor was used to dynamically rotate, flip, and distort the resolution of different parts of the images, and change the contrast and brightness of the images.
Code for Image Augmentation:
The model was created using the Keras ResNet50 class. The ResNet50 architecture was chosen so the weights generated by an extensively trained age classifier from the age and gender project could be used in transfer learning and also because ResNet (residual network) architectures are known to be good models for facial image recognition.
Other network architectures have achieved impressive results on face-based image classification tasks (see here, here and here) and future work could explore some of these architectures for BMI prediction.
Code Implementing Model Architecture:
Transfer learning was done to take advantage of the weights from the age classifier network as these should have been valuable for detecting lower-level features of the face to be used in predicting BMI. The age network was given a new linear regression output layer (outputting a number representing BMI) and was trained using MAE as a loss function and Adam as a training optimizer.
The model was first trained with every layer from the original age classifier frozen to allow the random weights of the new output layer to update. This first training session consisted of 10 epochs since after this, no noticeable decrease in MAE was observed (training was stopped using early stopping).
After this initial training session, the model was trained for 30 epochs with every layer in the network unfrozen to fine-tune all the weights in the network. Early stopping determined the number of epochs here as well, stopping training only after 10 epochs of no decrease in MAE was observed (a patience of 10). Since the model achieved the lowest validation MAE at epoch 20, training was stopped at epoch 30. The weights of the model at epoch 20 were taken and used in the demos below.
*graphs generated with Tensorboard
Mean absolute error was chosen as a loss function, as opposed to mean squared error (MSE) or root mean square error (RMSE) since the differences in the errors of BMI prediction scale linearly (an error of 10 in predicting BMI should only be punished twice as much as an error of 5).
Code for training the model:
Below is the model’s prediction of BMI on several images of Christian Bale, chosen as a subject here since he is known to drastically change his weight for different roles. Knowing that his height is 6’0″, his weight can be derived from the model’s BMI predictions.
The image on the left is from the move The Machinist where Bale said he was “probably about 135 pounds”. If his weight was 135 pounds then his BMI was 18.3 kg/m² (unit of BMI) and the model’s prediction is off by about 4 kg/m². The image in the middle is one I thought was representative of his weight when he had not drastically changed it for a role. The image on the right was taken around the time of the filming of Vice. I couldn’t find a number for his weight while filming Vice, but I found several sources saying he gained 45 pounds. If we assume his average weight is 200 pounds, and that he was 245 pounds with a BMI of 33.2 while filming Vice, then the model’s BMI prediction for this image would be off by about 1 kg/m².
Below is a recording of the model predicting my BMI. My BMI is 23 kg/m² so it seems the model is off by about 2–4 kg/m² when I’m looking straight at the camera and is off by up to 8 kg/m² when I move my head to the side and downwards.
The model achieved a validation MAE of 4.48. Given a subject who is 5″9 and 195 pounds (the average height and weight of American males) and has a BMI of 27.35 kg/m², this 4.48 error would result in a prediction range of 22.87 kg/m² to 31.83 kg/m², corresponding to weights of 163 and 227 pounds, respectively. Clearly there is room for improvement and future work would attempt to reduce this error.
An obvious shortcoming of the model is poor performance when evaluating images taken from angles other than from the front of the subject. It can be seen in the recording above that when I move my head to the side or downwards the model’s prediction becomes much less accurate.
Another possible shortcoming of the model that might help explain the model’s inaccurate prediction of the first image of Christian Bale, is poor performance when the subject is in a dark environment and is illuminated by a concentrated source of light. An intense concentration of light on a subject’s face might make a difference in how salient certain features are to the model —shadows are exaggerated, one side of the face might be lighter than the other, and curvatures and subtleties in skin appearance or bone structure might be blotted out — thus influencing the model’s BMI prediction.
It also seems possible that the model simply overestimates the BMI of subjects with lower BMIs in general as seen in its evaluation of the recording of myself and the first image of Christian Bale.
These shortcomings of the model might be explained by how odd angles, concentrated lighting, and lower BMIs are represented in the training data. Most of the training images were taken in good lighting, from the front of the subject, and were of subjects with a BMI above 25 kg/m². Therefore the model may not have been able to sufficiently learn the correlation of facial features to BMI in these different scenarios.
Training and Running the Model Yourself
Since the training data can’t be released publicly, training the model yourself isn’t possible but if you want to see the model in action you can clone this repo and follow the instructions in the README to run a demo. The demo will try to use your computer’s webcam to capture video and predict the BMI of faces it detects, as in the above recording.