Source: Deep Learning on Medium
An ultimate explanation for newbies with relevant links.
What is face recognition? What is deep learning? Most importantly, what power these two can bring together?
If you think this is too many questions for one article, hold on, I can change your mind. In this post, I have done my best to showcase all important concepts and typically make an easy and comprehensive manual for everyone. So all-in-all, I will distribute an explanation of how face recognition works, along with the Deep Learning technology that powers it.
The goal is not to give very detailed explanations but to provide a general overview with plenty of links so that you can go and dig deeper if you get interested. Here is a quick overview of the journey:
- The basics of deep learning
- The basics of face recognition
- Key steps to detect a face from a photo
- Deep learning in facial recognition
Without further ado, let’s jump right in!
Step 1. The Basics of Deep Learning
Deep Learning is nothing but a standard paradigm of Machine learning, or more precisely — one of its algorithms. For the greatest extent, it is based on a concept of a human brain and the interaction of neurons. If you start googling what Deep learning is, you will notice this super hot word today is far away from being new. Why so? The term itself appeared in the 1980s, but until 2012, there was not enough power to carry out this technology and almost no one paid attention to it.
What happened in 2012? A team led by Dahl won the Merck Molecular Activity Challenge using multi-task deep neural networks to predict the biomolecular target of one drug. And voila, this caused a significant boom in the mass media scene and so lots of other researchers and developers started to work with it too.
After a series of articles by famous scientists, publications in scientific journals, the technology quickly became viral. Today, it has a variety of applications and yes significant place among them is occupied by face recognition. First of all, deep learning gives the power to build recognition biometric software that is capable of uniquely identifying or verifying a person. All this because deep learning methods are able to leverage very large datasets of faces and learn rich and compact representations of faces, allowing modern models to first perform as-well and later to outperform the face recognition capabilities of humans.
So, how does deep learning work?
Deep learning systems are modeled after the neural networks in the neocortex of the human brain, where higher-level cognition occurs. In the brain, a neuron is a cell that transmits electrical or chemical information. When connected with other neurons, it forms a neural network. In machines, the neurons are virtual — basically bits of code running statistical regressions. String enough of these virtual neurons together and you get a virtual neural network.
While traditional machine learning algorithms are linear, deep learning algorithms are stacked in a hierarchy of increasing complexity and abstraction. To understand deep learning, imagine a toddler whose first word is a dog. The toddler learns what a dog is (and is not) by pointing to objects and saying the word dog. The parent says, “Yes, that is a dog,” or, “No, that is not a dog.” As the toddler continues to point to objects, he becomes more aware of the features that all dogs possess. What the toddler does, without knowing it, is clarify a complex abstraction (the concept of dog) by building a hierarchy in which each level of abstraction is created with the knowledge that was gained from the preceding layer of the hierarchy.
Computer programs that use deep learning go through much the same process. Each algorithm in the hierarchy applies a nonlinear transformation on its input and uses what it learns to create a statistical model as output. Iterations continue until the output has reached an acceptable level of accuracy. The number of processing layers through which data must pass is what inspired the label deep.
Relevant links to go deeper:
- Introduction to Convolutional Neural Networks by Jianxin Wu and Yann LeCun’s
- Gradient-Based Learning Applied to Document Recognition
- Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Grokking Deep Learning by Andrew W. Trask
- Machine Learning Yearning by Andrew Ng
Step 2. The Basics of Face Recognition
Let’s look first on how human do recognize the faces. Face perceptions are very complex as the recognition of facial expressions involves extensive and diverse areas in the brain. Brain imaging studies typically show a great deal of activity in an area of the temporal lobe known as the fusiform gyrus, an area also known to cause prosopagnosia when damaged (particularly when damage occurs on both sides). People learn to recognize faces from birth and nearly at the age of four months can clearly distinguish one person from another.
The main thing that a person pays attention to is the eyes, cheekbones, nose, mouth, and eyebrows, as well as the texture and color of the skin. At the same time, our brain processes the face as a whole and is able to identify a person even by half of the face. The brain compares the resulting picture with the internal averaged pattern and finds characteristic differences.
Okay, then how does the facial recognition system work?
First of all, the face recognition system needs to find a face in the image and highlight this area. For this, the software can use a variety of algorithms: for example, determining the similarity of proportions and skin color, the selection of contours in the image and their comparison with the contours of faces, the selection of symmetries using neural networks. The most effective is the Viola-Jones method, which can be used in real time. With it, the system recognizes faces even when rotated 30 degrees.
The method is based on the signs of Haar, which are a set of black and white rectangular masks of different shapes. The masks are superimposed on different parts of the image, and the algorithm adds the brightness of all the pixels of the image that are under the black and white parts of the mask and then calculates the difference between these values. Next, the system compares the results with the accumulated data and, having determined the face in the image, continues to track it to select the optimum angle and image quality. For this purpose, motion vector prediction algorithms or correlation algorithms are used.
Having chosen the most successful pictures, the system proceeds to face recognition and its comparison with the existing base. It works according to the same principles as the artist paints portraits — the program finds the reference points on the person’s face that make up the individual features. As a rule, the program allocates about 100 such points.
The most important measurements for face recognition programs are the distance between the eyes, the width of the nostrils, the length of the nose, the height and shape of the cheekbones, the width of the chin, the height of the forehead and other parameters. After that, the obtained data are compared with those available in the database, and, if the parameters coincide, the person is identified.
Step 3. Key Steps to Detect a Face from a Photo
Overview of the Steps in a Face Recognition Process. Taken from “Handbook of Face Recognition,” 2011 by Stan Z. Li and Anil K. Jain
Face recognition is actually a sequence of several related steps:
1. First, you need to look at the image and find all the faces on it.
2. Secondly, it is necessary to focus on each face and determine that, despite the unnatural turn of the face or poor lighting, it is the same person.
3. Thirdly, it is necessary to highlight the unique characteristics of the face, which can be used to distinguish it from other people — for example, the size of the eyes, the elongation of the face, etc.
4. In conclusion, it is necessary to compare these unique characteristics of the face with the characteristics of other people you know to determine the name of the person.
The human brain does all this automatically and instantly. In fact, people recognize faces extremely well and, ultimately, see faces in everyday objects. Computers are incapable of such a high level of generalization (at least for the time being …), so you have to teach them every step in the process separately.
It is necessary to build a pipeline on which you will find a solution at each step of the face recognition process separately and transfer the result of the current step to the next. In other words, you need to combine several machine learning algorithms into one chain:
Relevant links to go deeper:
- Face Recognition: A Literature Survey
- Human and machine recognition of faces: A survey
- Robust Real-time Object Detection
Step 4. Deep Learning in Facial Recognition
It turns out that the characteristics that seem obvious to us humans (for example, eye color) do not make sense for a computer analyzing individual pixels in an image. The researchers found that the most appropriate approach is to enable the computer to determine the characteristics that need to be collected. Deep learning, in turn, allows much better and faster identification.
And such a possibility appeared, or it’s better to say discovered, quite recently. Initially, everyone believed a neural network can not achieve near-human-level performance. Everything, however, changed in 2014. The scientists decided to check it out by means of the two best networks at the moment — AlexNet and the network developed by Matthew D. Zeiler and Rob Fergus — they compared them with the response of different areas of the brain of the monkey, which was also taught to recognize some objects. The objects were from the animal world so that the monkey does not get confused.
Since it is clearly impossible to get a response from the monkey, electrodes were implanted into it and the response of each neuron was measured directly. It turned out that under normal conditions, brain cells reacted as well as the state of the art model at that time, that is, the Matthew Zeiler network.
However, with an increase in the speed of displaying objects, an increase in the number of noises and objects in the image, the recognition rate and its quality of our brain and the brain of primates decrease dramatically. Even the simplest convolutional neural network recognizes objects better. That is, officially neural networks work better than our brains.
Aside from AlexNet and Zeiler network breakthrough in deep learning for face recognition, there are also other milestone systems like DeepFace, the DeepID series of systems, VGGFace, and FaceNet. Knowing its history is necessary if you want to understand better how face recognition and deep learning arose together:
- DeepFace is a facial recognition system based on deep convolutional neural networks created by a research group at Facebook in 2014. It identifies human faces in digital images. With an accuracy of 97%, it was a major leap forward using deep learning for face recognition.
- The DeepID, or “Deep hidden IDentity features,” is a series of systems (e.g. DeepID, DeepID2, etc.), first described by Yi Sun, et al. in their 2014 paper titled “Deep Learning Face Representation from Predicting 10,000 Classes.” Their system was first described much like DeepFace, although was expanded in subsequent publications to support both identification and verification tasks by training via contrastive loss.
- The VGGFace (for lack of a better name) was developed by Omkar Parkhi, et al. from the Visual Geometry Group (VGG) at Oxford and was described in their 2015 paper titled “Deep Face Recognition.” In addition to a better-tuned model, the focus of their work was on how to collect a very large training dataset and use this to train a very deep CNN model for face recognition that allowed them to achieve then state-of-the-art results on standard datasets.
- FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results on a range of face recognition benchmark datasets. The FaceNet system can be used broadly thanks to multiple third-party open source implementations of the model and the availability of pre-trained models.
Relevant links to go deeper:
- Deep Face Recognition: A Survey
- Deep face recognition
- FaceNet: A Unified Embedding for Face Recognition and Clustering
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification
Wrapping It Up, or What to Do Next?
Without any doubt, Deep learning shows its great power now in relation to Face recognition delivering superhuman performance and great accuracy of output results. In this guide, you have learned bits and pieces of history of deep learning and face recognition, how these technologies have developed and how they work now.
If you want to go further and learn how to implement this state of the art pipeline, I recommend you reading these incredible articles:
- Building a Facial Recognition Pipeline with Deep Learning in Tensorflow
- Making your own Face Recognition System
Plus, you can pass these two great online courses to reinforce the knowledge acquired by my guide: