Building a real-time smile detection app with deeplearn.js and the web shape detection API — Part 2 Image processing
This is the second post in a series explaining how to achieve real-time smile detection using deeplearn.js.
In the previous post we successfully made use of the Shape Detection API to find faces in real-time from a video feed. Now we need extract the faces from the feed, and format them in a way that makes them efficiently processable by a neural network.
This will require a number of steps:
- Crop the video so we are left with only the detected faces
- Resize the cropped face so they are much smaller
- Convert the cropped resized image to greyscale
- Get the normalised pixel data for the images
Each of these steps makes the process of feeding the image into the neural network much more efficient.
We want to use the smallest image possible so that the number of pixels the networks have to process are at their minimum. This is achieved by cropping the feed to just contain the faces, and scaling it down to 50 x 50 pixels.
You don’t really need colour in an image to determine if someone is smiling. By converting the faces to greyscale we also save a bit of processing power as the 3 values for the red green blue pixels are converted into 1.
Finally we will normalise the pixel values that will eventually be passed into the network. This means instead of having an array of values between 0 and 255, we have an array of values between 0 and 1.
If you have enabled Chrome’s experimental features you can see a demo here: https://face-extraction.netlify.com/
Notice the little grey face in the top left.
The finished code for this section can also be found here: https://github.com/zefman/smiley/tree/feature/face_detection
If you haven’t read the first post in this series please do so otherwise this will make no sense ?
Cropping the faces and resizing
To crop out and resize the faces we will use an additional smaller canvas. For the time being we will continue to place everything in the App component “src/app/app.component.ts”.
Here you can see the additional canvas we add to the app component. We set it to 50 x 50 pixels, meaning our resized image will have 2500 individual pixels. This should be small enough for the neural network to consume.
The rest of the HTML is untouched other than the addition of a few classes.
Moving onto “src/app/app.component.css”:
We will place the smaller canvas above everything in the top left. This canvas doesn’t actually need to be visible, but it will be useful to see the results of our image processing while in development.
Defining variables in the app component
We need to define another ViewChild to get a reference to the new faceCanvas. We also define two more variables to hold the native canvas element and its rendering context.
Referencing the DOM elements
Just like we did for the other elements we get a reference to the new canvas and its context after the view initialises.
Processing the faces
We then create a new function called processFace. This function takes a face object returned by the face detector, uses its bounding area to copy that portion of the larger canvas to our new 50 x 50 px one, and at the same time resizing it. This is achieved by passing the original canvas to the the drawImage function of the faceCtx along with the area to crop from.
We then grab hold of the the new smaller image’s image data and convert it to greyscale. The image data is originally a 1 dimensional array of values between 0 and 255. 1 pixel is made up 4 entries in the array eg. [ 45, 200, 202, 255, 76, 98, 201, 255, 253, 222, 98, 0, … ] (The bold section is one pixel). The first value is the red value for that pixel, the second is the green, the third is the blue, and finally the fourth is the alpha value.
To convert these individual pixels to greyscale we create a for loop that increments by 4 each time allowing us to modify the individual rgba values of each pixel. We then create a new brightness value that we will use to replace the original pixel values. This is done by summing the values of the rgb parts of the pixel. You might notice the r value is multiplied by .3, the b by .59, and the g by .11. This is to account for the way our eyes are sensitive to different colours and will produce a more natural looking greyscale result.
Once we have set of rgb pixels to their new brightness value, we then place the modified data back onto the canvas with the putImageData function.
Finally we need to use the processFace function in our update to continually draw the detected faces to our small canvas.
You should now have a strange black and white face updating in the top left of your browser! This won’t work well with multiple faces but it’s not a problem as we are only checking the processFace function works.
The final thing we need to do is get the normalised image data for the face. At the moment if we were to get the image data from the faceCtx we would have an array of values from 0 to 255, and each pixel would have the same value set for its rgb variables. This doesn’t provide the neural network with any additional information, so we may as well take only the r value of each pixel. On top of that we need to normalise the values so they fall between 0 and 1 rather than 0 and 255. To do this we will create two new functions to be used later on.
Hopefully after the above description that code makes sense. We will be passing getNormalizedGreyScalePixels the imageData from our faceCtx, we take the r value from each pixel, normalise it, and then add it to a new array.
And we’re done – phew ?
I hadn’t realised before starting this project how much of the effort would go into formatting the data correctly before even getting to the machine learning element.
So at this point we have successfully taken the detected faces and modified them in a way that will make them more easily processable by the neural network which we will be creating in a forthcoming post.
In the next post we will look into how we can save labeled training data to teach the neural network to recognise smiling faces. This will involve saving the image data into two sets, one smiling, one not smiling.
In the meantime feel free to get in contact with me on Twitter: @jozefmaxted
Building a real-time smile detection app with deeplearn.js was originally published in The Unit on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source: Deep Learning on Medium