Original article was published by VAIBHAV HARIRAMANI on Artificial Intelligence on Medium
Face Identification using Haar cascade classifier
Learn how to develop a face identification system using haar cascade classifier
A facial identification system is a technology capable of identifying a face of a person from a digital image or a video frame from a video source.
Cascade classifier, or namely cascade of boosted classifiers working with haar-like features, is a special case of ensemble learning, called boosting. It typically relies on Adaboost classifiers (and other models such as Real Adaboost, Gentle Adaboost or Logitboost).
Cascade classifiers are trained on a few hundred sample images of image that contain the object we want to detect, and other images that do not contain those images.
How can we detect if a face is there or not ?
There is an algorithm, called Viola–Jones object detection framework, that includes all the steps required for live face detection :
- Haar Feature Selection, features derived from Haar wavelets
- Create integral image
- Adaboost Training
- Cascading Classifiers
Haar Cascade classifier is based on the Haar Wavelet technique to analyze pixels in the image into squares by function. This uses “integral image” concepts to compute the “features” detected.
Haar cascade classifier is based on the Viola-Jones detection algorithm which is trained in given some input faces and non-faces and training a classifier that identifies a face.
I.1.a. Haar Feature Selection
There are some common features that we find on most common human faces :
- a dark eye region compared to upper-cheeks
- a bright nose bridge region compared to the eyes
- some specific location of eyes, mouth, nose…
The characteristics are called Haar Features. The feature extraction process will look like this :
Haar features are similar to these convolution kernels which are used to detect the presence of that feature in the given image.
Detect edges using convolution kernels:
Given an input image and convolution kernel, we place kernel to a corner and do convolution multiplication shifting the kernels.
This method is used to detect different types of edges using different kernels.
A Haar-Feature is just like a kernel in CNN, except that in a CNN, the values of the kernel are determined by training, while a Haar-Feature is manually determined.
Here are some Haar-Features. The first two are “edge features”, used to detect edges. The third is a “line feature”, while the fourth is a “four rectangle feature”, most likely used to detect a slanted line.
When haar features are applied to image of a girl.
Each feature results in a single value which is calculated by subtracting the sum of pixels under a white rectangle from the sum of pixels under the black rectangle.
Every haar feature has some sort of resemblance to identify a part of the face.
Viola-Jones uses 24*24 as base window size and calculates the above features all over the image shifting by 1 PX.
If we consider all possible parameters of the haar features like position, scale, and type we end up calculating about 160,000+ features. So we need to evaluate a huge set of features for every 24*24 PX.
So to avoid this we have an idea to avoid redundant features and pick only those features which are very useful for us. This can be done using AdaBoost.
I.1.b. The integral image
Computing the rectangle features in a convolutional kernel style can be long, very long. For this reason, the authors, Viola and Jones, proposed an intermediate representation for the image : the integral image. The role of the integral image is to allow any rectangular sum to be computed simply, using only four values. We’ll see how it works !
Suppose we want to determine the rectangle features at a given pixel with coordinates (x,y). Then, the integral image of the pixel in the sum of the pixels above and to the left of the given pixel.
where ii(x,y) is the integral image and i(x,y) is the original image.
When you compute the whole integral image, there is a form a recurrence which requires only one pass over the original image. Indeed, we can define the following pair of recurrences :
where s(x,y) is the cumulative row sum and and s(x−1)=0, ii(−1,y)=0.
How can that be useful ? Well, consider a region D for which we would like to estimate the sum of the pixels. We have defined 3 other regions : A, B and C.
- The value of the integral image at point 1 is the sum of the pixels in rectangle A.
- The value at point 2 is A + B
- The value at point 3 is A + C
- The value at point 4 is A + B + C + D.
Therefore, the sum of pixels in region D can simply be computed as : 4+1−(2+3).
And over a single pass, we have computed the value inside a rectangle using only 4 array references.
I.1c. Learning the classification function with Adaboost
As stated previously there can be approximately 160,000 + feature values within a detector at 24*24 base resolution which need to be calculated. But it is to be understood that only a few set of features will be useful among all these features to identify a face.
AdaBoost is used to remove redundant features and choose only relevant features.
Given a set of labeled training images (positive or negative), Adaboost is used to :
- select a small set of features
- and train the classifier
AdaBoost is used to get the best features among all these 160,000+ features. These features are also called as weak classifiers. After these features are found a weighted combination of all these features in use in evaluating and deciding any given window has a face or not.
I.1.d. Cascading Classifier
Although the above processes described above is quite efficient, a major issue remains. In an image, most of the image is a non-face region. Giving equal importance to each region of the image makes no sense, since we should mainly focus on the regions that are most likely to contain a picture. Viola and Jones achieved an increased detection rate while reducing computation time using Cascading Classifiers.
The key idea is to reject sub-windows that do not contain faces while identifying regions that do. Since the task is to identify properly the face, we want to minimize the false negative rate, i.e the sub-windows that contain a face and have not been identified as such.
A series of classifiers are applied to every sub-window. These classifiers are simple decision trees :
- if the first classifier is positive, we move on to the second
- if the second classifier is positive, we move on to the third
Any negative result at some point leads to a rejection of the sub-window as potentially containing a face. The initial classifier eliminates most negative examples at a low computational cost, and the following classifiers eliminate additional negative examples but require more computational effort.
The classifiers are trained using Adaboost and adjusting the threshold to minimize the false rate. When training such model, the variables are the following :
- the number of classifier stages
- the number of features in each stage
- the threshold of each stage
The job of each stage is used to determine whether a given sub window is definitely not a face or may be a face. A given sub window is immediately discarded as not a face if it fails in any of the stage.
Luckily in OpenCV, this whole model is already pre-trained for face detection.
The viola-Jones face detection algorithm is trained and weights are stored in the disk. All we do is take the features from the file and apply to our image, if the face is present in the image we get the face location.
You can find the face detection implementation by OpenCV “haarcascade_frontalface_default.xml” file here.
After identifying a face or multiple faces from an image we can implement a deep learning face recognition system that can classify a given face-recognizing a person.
To See the implementation of Haar cascade classifier and face recognition using OpenCV in python Checkout this blog here.
I will made a quick YouTube illustration of the face detection algorithm.
Drawbacks of Haar Cascade Classifiers
Althrough , While cascade methods are extremely fast, they leave much to be desired. If you’ve ever used the Haar cascade classifiers provided by OpenCV (i.e. the Viola-Jones detectors) to detect faces you’ll know exactly what I’m talking about.
In order to detect faces/humans/objects/whatever in OpenCV (and remove the false positives), you’ll spend a lot of time tuning the cv2.detectMultiScale parameters. And again, there is no guarantee that the exact same parameters will work from image-to-image. This makes batch-processing large datasets for face detection a tedious task since you’ll be very concerned with either (1) falsely detecting faces or (2) missing faces entirely, simply due to poor parameter choices on a per image basis.
To troubleshoot this issue we have Histogram of Oriented Gradients approach about which we will discuss in new blog untill then stay tuned..
Here is The original paper was published in 2001 by Viola & Jones.
This article was originally published on my personal blog : https://vaibhavhariramani.blogspot.com/2020/04/a-full-guide-to-face-detection.html
The Github repository of this article (and all the others from my blog) can be found here :
vaibhavhariaramani / FaceDetection
Do Checkout My other Blogs
Thank You for reading
Please give 👏🏻 Claps if you like the blog.