Pupillary Distance Measurement with Fully Convolutional Neural Networks — back in the game

Source: Deep Learning on Medium

Pupillary Distance Measurement with Fully Convolutional Neural Networks — back in the game

One of the recently completed projects from the field of Machine Learning is the implementation of the model for pupillary distance measurement. It took more time as expected at the beginning, but the more challenging the problem became, the more determined I became to solve it.

The purpose of this blog post is to warm up after being idle with writing for too long and share experiences from the trenches.

Distance Measurement

The main goal was to automate the pupillary distance measurement by using the standard card (loyalty card or the back of the credit card) as a size reference. By knowing the width and height of the card in millimeters, it is possible to calculate the pixels/millimeters ration by detecting the card corners and use this ratio to calculate the absolute distance in millimeters between the detected pupils. Below is the image of the example detections during the learning procedure.

TensorBoard detection display during the training procedure. The images in the upper row are actual positions of the pupils and card corners. The images in the lower row are the detections.

Data Collection and Augmentations

Collecting the data was one of the biggest challenges along the way and consequently, the number of collected images affected the model architecture choice/design. As the neural network requires a lot of images to train from scratch, I initially decided to use pre-trained VGG, ResNet, and Inception models, apply transfer learning and pick the best one. At the beginning of the project, I managed to collect approx. 150 images from friends, who made selfies with cards below the nose. It turned out that often people don’t hold the card such that it touches the mouth, this fact posed another challenge due to focal lens distortions and therefore I collected additional selfies taken with cards on the forehead. Altogether, before the initial learning procedure, I have collected less than 200 images.

In order to extend the dataset, I implemented the logic for data augmentation by applying random translations, rotations, flips, and contrast/color adjustments. Later on, as I desperately needed to increase the dataset size, I tried to implement random occlusions as well — on the provided images, rectangles of random sizes were randomly drawn over the images. The latter attempt resulted in a small loss error reduction.

Initial attempt — Transfer Learning

The struggle

After collecting the initial dataset and implementing the logic for data augmentation, I tried to cut design the custom fully connected neural network with various combinations of layer numbers and numbers of neurons at each layer. Additionally, I tried to split the neural network into two heads — one for the classification of the detected points and another for regression to determine the locations of the points. At this point, the real struggle started. None of the combinations seemed promising enough and I came to a point where I started to think whether it is better to cut the losses or push forward. While thinking about giving up, at the same time I was wondering how many times we give up too early and miss the solution that might be hidden just around the corner. Thinking about that, I became even more determined to solve the problem, but I realized that I have to try another approach instead of banging against the wall.

Translating the Problem to Segmentation

I searched for additional papers about regression problems and among other things, I tried to transform the problem into segmentation. On the annotated images, I applied six masks with Gaussian kernels — every point (left/right pupil and card corners) was represented as a heatmap with mean value equal to point location and a certain variance (spread of the spot). Then, by using such representations as inputs, I tried to train the model to classify each pixel.

Functions to translate the points into heatmaps.

For the optimization criteria, the binary classification loss has been used.

In order to find the detection point candidates, I tried to use kernels of various sizes to detect dense regions on the heatmap. Then, after finding the appropriate kernel size and density threshold that detected the actual annotated points with a certain spread, the point location was calculated as an average of the spread.

Instead of using the initially chosen architectures, I implemented a Stacked Hourglass Fully Convolutional Neural Network and trained it from scratch.

The Evolution of Deep Learning and the Advent of Practical Applications

I became obsessed with machine learning when I first realized how powerful it is by building the model for Diabetic Retinopathy Detection. In 2014, while completing a Ph.D., I decided to attend the competition on Kaggle. It was a moment of truth and honestly a little scary — I’ve spent a lot of time studying the theory and engaging in the lab experiments in which I managed to prove the hypotheses and then the moment came in which I had to check if the knowledge could be applied outside the academia in the wild.

If I haven’t taken that step back then, today, probably I would still live in a bubble. Among 600 thousand participants, I held 3rd place for some time. At that time, I remember discussing this competition with a professor and he said there is no chance to compete against the doctors with domain knowledge. Now it has become evident that domain knowledge is no longer that important if the problems are tackled at the different levels of abstraction, without the design of specific features by relying on domain knowledge. In order to become a common belief, some people had to swim against the mainstream. Similarly, a person as an individual has to explore new paths instead of exploiting the common beliefs in order to do anything meaningful.

After that that Kaggle competition, I was looking to work on projects in which deep learning has been applied to solve practical problems but somehow I ended up in a different environment in which machine learning-related work was not present. Nevertheless, I tried to stay up to date with the field.

With transfer learning, the experiments are no longer limited by the dataset availability — everyone can collect enough data, perform the experiments and deploy the models and this is the reason new ideas have been keeping me awake at night again. With the right idea at hand, new solutions could be delivered from the basement, without laboratory equipment and data warehouse.