How does the stunningly realistic face swap algorithm works? And how can we use it to find the hottest girl on the net?
You may have already heard about DeepFake or if not then I’m sure you will hear about it very soon.
Not so long ago a reddit user called deepfakes posted some highly realistic but fake videos. Using a smart deep learning algorithm he swapped the faces of the original actors with some other famous persons. Yes, before you ask , it was adult content. As so many things in the history of humankind this great invention was also driven by t̶h̶e̶ ̶m̶i̶l̶i̶t̶a̶r̶y̶ porn.
Certainly its application area is much wider than simply producing stunningly realistic adult movies. I̶t̶ ̶a̶l̶s̶o̶ ̶i̶n̶c̶l̶u̶d̶e̶s̶ ̶p̶u̶t̶t̶i̶n̶g̶ ̶N̶i̶c̶ ̶C̶a̶g̶e̶ ̶t̶o̶ ̶e̶v̶e̶r̶y̶ ̶m̶o̶v̶i̶e̶s̶!̶ Do you remember for some of the latest tragic deaths of Hollywood actors and how challenging and expensive it was to finish their latest movies afterwards? Using deepfakes these tasks can be done easily at home on a desktop computer in hours in an extremely high quality.
So how does this great algorithm works? We can break it down into 3 major steps:
(0. Collect pictures about your to-be-faked target person and selecting a video to be modified)
We only want to swap faces so our first step must be to identify faces on pictures. Once we found their position we can also identify their orientation and size. Using this information we can warp all picture to a general format.
Face detection is a traditional machine learning problem. One common way to do that is to use the Histogram of Oriented Gradients (or simply HOG). This method looks at each pixel of our picture and compares its darkness to the surrounding pixels. Then we can draw an arrow (meaning gradient) from the lighter to the darker points. We will use this representation of our original images. There is a HOG pattern that was trained on thousands of faces. We can compare it to our converted images to find areas that are very similar thereby identifying faces.
Using our extracted face pictures we can train an autoencoder.
Autoencoder is a type of neural network (the algorithm behind deep learning). Its input is a picture and its output is the same picture. Doesn’t make any sense, right?
Autoencoders consist of 2 parts: an encoder and a decoder. Encoder learns a much shorter representation of the input data while the decoder is able to transform it back to the original data. So encoder compress the data into a shorter size and decoder translate is back to the original longer format.
We train our model on our target pictures so our network will learn only how to encode a short representation into our target persons’s face. It means even if our input will be a different face our model will convert it into the original person’s face.
Now we can identify faces in our target video (video is just a bunch of pictures) then run them through our trained model converting the original actor to the target person.
In this phase we will take the new pictures created by the autoencoder. In the alignment phase we saved their original orientation and size so doing some backward engineering we can put them back into the original picture. Out of the modified pictures we can recreate our video.
And that’s all, our new deepfake video is ready.
Certainly there are some more bells and whistles here and there but the basic algorithm works simply like this.
Where are the hot girls you might ask? Check part 2 of this article and you will get know it. ?
- Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning
- reddit.com: over 18?
Source: Deep Learning on Medium