Original article was published on Artificial Intelligence on Medium
Facial Recognition for Kids of all Ages
In this 4-part article series, complex topics are broken down for beginners to understand. The contents are: 1 — Facial detection, 2 — Facial recognition, 3 — Face recognition in Python, 4 — Facial recognition in a masked future.
Facial Recognition: The introduction
According to Wikipedia, “A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source” In simple terms, face recognition is used to see if an image contains a particular person’s face.
Facial recognition falls under object-class detection. Current uses of face recognition/detection include filters on social media, facial unlock on certain smartphones, and US border control.
There are two stages of facial recognition:
Stage 1: Facial detection: The detection of faces in digital images
Stage 2: Facial recognition: The attribution of detected faces to a particular person.
Stage 1 — Facial detection
For this article, we will detect frontal human faces.
Computers are excellent in performing mathematical or repetitive operations but aren’t competent in cognitive thinking. On the other hand, we humans can very easily tell if we are seeing a human face, and we adjust for minor variations since we have been learning how humans look for years and years. For example, ever since our earliest days, we can distinguish between our parent’s faces and our toys.
This is due to the interconnected network of neurons in our brain — something computers don’t have. This vast set of neurons allow us to adapt to new situations, without prior experience. The only advantage computers have in this cognitive scenario is their ability to quickly index and recall information from absurdly large databases.
Neurons in our brain
For example, The US population is currently estimated to be around 338 million. The first and last names of every inhabitant can be stored in less than 1.5 Gigabytes, nearly 1/100th of an average C:// drive’s size (around 120–200 GB). Each of these names can easily be retrieved. However, we humans often have great trouble remembering a few sentences from our textbooks during exams.
Our computer needs to learn how to detect faces, so we will be using/working in the field of Machine Learning. There are many ways to do this, including using Neural Networks (a reference to the neural network in our brain).
So, how can a computer detect/recognize faces? One method is to generate a 128-point vector map of the particular user’s facial features. When another image is taken, it generates another face map and uses Euclidean distance to see if the new face map is present in the database. This method can be implemented using the dlib library. However, we will be using something else.
Let’s set 3 requirements for our facial detection method — Accuracy, efficiency, and speed. What can meet these needs? The Haar cascade model.
What are the Haar-like features?
A Haar-like feature is made by looking at a point on a face, drawing imaginary rectangles around the point, adding the pixel intensities in each rectangle, and calculating their differences.
By itself, a single Haar-like feature isn’t that useful. But tens of thousands of these features can be compiled to make a haar cascade. A Haar cascade is the computer’s definition of a face building upon the idea that human faces are similar (You and I look similar to each other. I don’t look like a bird, and you don’t look like an alligator.)
The Haar cascade method is excellent. The main reason is its processing speed. Due to its use of summed-area tables (integral images)¹, Haar cascades can be used in real-time calculation, detecting faces in a live video stream.
Original vs. Integral image
Object detection using Haar cascades was proposed by Paul Viola and Michael Jones. Their 2001 research paper “Rapid Object Detection using a Boosted Cascade of Simple Features”² says that this method could be used to detect fundamentally any object.
Digital images are stored in the RedGreenBlue(RGB) format and are made up of many dots called pixels. Each pixel has 3 channels: Red, Green, and Blue. The values in each of these channels are represented in binary(0s and 1s). In this way, computers can store an image in a digital form.
However, Haar cascades work on black and white images, where each pixel value ranges from 0(black) to 255 (white). If cascades factored in color, they would take too much time and resources to compute, losing their advantage in real-time face detection.
Haar-like Features in Action:
In the next article, we will be talking about Stage 2 — facial recognition using LBPH models. After that, we will build our first facial recognition program!
¹Crow, Franklin (1984). Summed-area tables for texture mapping. SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques. pp. 207–212.
²Viola, P., & Jones, M. (n.d.). Rapid object detection using a boosted cascade of simple features. Retrieved May 20, 2020, from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.6807