Original article was published on Deep Learning on Medium
Data Science Solution Architect
Face recognition technology appears in a different light today. Use cases include broad application from crime detection to the identification of genetic diseases.
While governments across the world have been investing in facial recognition systems, some US cities like Oakland, Somerville, and Portland, have banned it due to civil rights and privacy concerns.
What is it — a time bomb or a technological breakthrough? This article opens up what face recognition is from a technology perspective, and how deep learning increases its capacities. Only by realizing how face recognition technology works from the inside out, it’s possible to understand what it is capable of.
How Does Facial Recognition Work?
The computer algorithm of facial recognition software is a bit like human visual recognition. But if people store visual data in a brain and automatically recall visual data once needed, computers should request data from a database and match them to identify a human face.
In a nutshell, a computerized system equipped by a camera, detects and identifies a human face, extracts facial features like the distance between eyes, a length of a nose, a shape of a forehead and cheekbones. Then, the system recognizes the face and matches it to images stored in a database.
However, a traditional face recognition technology is not all perfect yet. It has both strength and weaknesses:
- Contactless biometric identification
- Up to one second data processing
- Compatibility to most cameras
- The ease of integration
- Twins and race bias
- Data privacy issues
- Presentation attacks (PA)
- Low accuracy in poor lighting conditions
Realizing the weaknesses of face recognition systems, data scientists went further. By applying traditional computer vision techniques and deep learning algorithms, they fine-tuned the face recognition system to prevent attacks and enhance accuracy. That is how a face anti-spoofing technology operates.
How Deep Learning Upgrades Face Recognition Software
Deep learning is one of the most novel ways to improve face recognition technology. The idea is to extract face embeddings from images with faces. Such facial embeddings will be unique for different faces. And training of a deep neural network is the most optimal way to perform this task.
Depending on a task and timeframes, there are two common methods to use deep learning for face recognition systems:
Use pre-trained models such as dlib, DeepFace, FaceNet, and others. This method takes less time and effort because pre-trained models already have a set of algorithms for face recognition purposes. We also can fine-tune pre-trained models to avoid bias and let the face recognition system work properly.
Develop a neural network from scratch. This method is suitable for complex face recognition systems having multi-purpose functionality. It takes more time and effort, and requires millions of images in the training dataset, unlike a pre-trained model which requires only thousands of images in case of transfer learning.
But if the facial recognition system includes unique features, it may be an optimal way in the long run. The key points to pay attention to are:
- The correct selection of CNN architecture and loss function
- Inference time optimization
- The power of a hardware
It’s recommended to use convolutional neural networks (CNN) when developing a network architecture as they have proven to be effective in image recognition and classification tasks. In order to get expected results, it’s better to use a generally accepted neural network architecture as a basis, for example, ResNet or EfficientNet.
When training a neural network for face recognition software development purposes, we should minimize errors in most cases. Here it’s crucial to consider loss functions used for calculation of error between real and predicted output. The most commonly used functions in facial recognition systems are triplet loss and AM-Softmax.
- The triplet loss function implies having three images of two different people. There are two images — anchor and positive — for one person, and the third one — negative — for another person. Network parameters are being learned so to bring the same people closer in the feature space, and separate different people.
- AM-Softmax function is one of the most recent modifications of standard softmax function, which utilizes a particular regularization based on an additive margin. It allows achieving better separability of classes and therefore improves face recognition system accuracy.
There are also several approaches to improve a neural network. In facial recognition systems, the most interesting are knowledge distillation, transfer learning, quantization, and depth-separable convolutions.
- Knowledge distillation involves two different sized networks when a large network teaches its own smaller variation. The key value is that after the training, the smaller network works faster than the large one, giving the same result.
- Transfer learning approach allows improving the accuracy through training the whole network or only certain layers on a specific dataset. For example, if the face recognition system has race bias issues, we can take a particular set of images, let’s say, pictures of Chinese people, and train the network so to reach higher accuracy.
- Quantization approach improves a neural network to reach higher processing speed. By approximating a neural network that uses floating-point numbers by a neural network of low bit width numbers, we can reduce the memory size and number of computations.
- Depthwise separable convolutions is a class of layers, which allow building CNN with a much smaller set of parameters compared to standard CNNs. While having a small number of computations, this feature can improve the facial recognition system so as to make it suitable for mobile vision applications.
The key element of deep learning technologies is the demand for high-powered hardware. When using deep neural networks for face recognition software development, the goal is not only to enhance recognition accuracy but also to reduce the response time. That is why GPU, for example, is more suitable for deep learning-powered face recognition systems, than CPU.
How We Implemented Deep Learning-Powered Face Recognition App
When developing the Big Brother (a demo camera app) at MobiDev, we were aimed at creating biometric verification software with real-time video streaming. Being a local console app for Ubuntu and Raspbian, Big Brother is written in Golang, and configured with Local Camera ID and Camera Reader type via the JSON config file. This video describes how Big Brother works in practice:
From the inside, Big Brother app’s working cycle comprises:
1. Face detection
The app detects faces in a video stream. Once the face is captured, the image is cropped and sent to the back end via HTTP form-data request. The back end API saves the image to a local file system and saves a record to Detection Log with a personID.
The back end utilizes Golang and MongoDB Collections to store employee data. All API requests are based on RESTful API.
2. Instant face recognition
The back end has a background worker that finds new unclassified records and uses Dlib to calculate the 128-dimensional descriptor vector of face features. Whenever a vector is calculated, it is compared with multiple reference face images by calculating Euclidean distance to each feature vector of each Person in the database, finding a match.
If the Euclidean distance to the detected person is less than 0.6, the worker sets a personID to the Detection Log and marks it as classified. If the distance exceeds 0.6, it creates a new Person ID to the log.
3. Follow-up actions: alerting, grant access, and other
Images of an unidentified person are sent to the corresponding manager with notifications via chatbots in messengers. In the Big Brother app, we used Microsoft Bot Framework and Python-based Errbot, which allowed us to implement the alert chatbot within five days.
Afterward, these records can be managed via the Admin Panel, which stores photos with IDs in the database. The face recognition software works in real-time and performs face recognition tasks instantly. By utilizing Golang and MongoDB Collections for employee data storage, we entered the IDs database, including 200 entries.
Here is how Big Brother face recognition app is designed:
In the case of scaling up to 10,000 entries, we would recommend improving the face recognition system in order to keep high recognition speed on the back end. One of the optimal ways is to use parallelization. By setting up a load balancer and building several web workers, we can ensure the proper work of a back end part and optimal speed of an entire system.
Other Deep Learning-Based Recognition Use Cases
Face recognition is not the only task where deep learning-based software development can enhance performance. Other examples include:
In the last couple of years, manufacturers have been using AI-based visual inspection for defects detection. The development of deep learning algorithms allows this system to define the tiniest scratches and cracks automatically, avoiding human factors.
Body abnormalities detection
Israel-based company Aidoc developed a deep learning-powered solution for radiology. By analyzing medical images, this system detects abnormalities in a chest, c-spine, head, and abdomen.
Speaker Identification technology created by Phonexia company also identifies speakers by utilizing the metric learning approach. The system recognizes speakers by voice, producing mathematical models of human speech named voiceprints. Those voiceprints are stored in databases, and when a person speaks the speaker technology identifies the unique voiceprint.
Recognition of human emotions is a doable task today. By tracking movements of a face via camera, the Emotion Recognition technology categorizes human emotions. The deep learning algorithm identifies landmark points of a human face, detects a neutral facial expression, and measures deviations of facial expressions recognizing more positive or negative ones.
Visual One company, which is a provider of Nest Cams, powered their product with AI. By utilizing deep learning techniques, they fine-tuned Nest Cams to recognize not only different objects like people, pets, cars, etc., but also identify actions. The set of actions to be recognized is customizable and selected by the user. For example, a camera can recognize a cat scratching the door, or a kid playing with the stove.