Original article can be found here (source): Artificial Intelligence on Medium
In some swapped face, the skin tone looks un-nature.
Or is it just a bad tanning session of the celebrities? 😂
One way to overcome this problem is selecting candidates with similar skin tones, hairstyles, and the shapes of the face to swap.
Here, Paul Rudd’s face is replaced by Jimmy Fallon’s face.
In addition, candidates are selected that are good at impersonating people’s voices, gestures, and expressions.
When we merge the replaced face with the original face, if the mask or the merging is not done probably, we may see two sets of eyebrows — one set from the new face and the other from the original face.
A double chin can happen also but it is harder to tell whether it is natural or not if you do not know the original person well.
While trying to spot abnormalities in the facial area, we can compare the face with other parts of the body. Obviously, you cannot put a 60 years actor face on a 20 somethings actress, in particular, that is Jennifer Lawrence. The skin texture and the smoothness of the arm will not match the face.
In general, look for the differences, including tones, sharpness, and texture, between the impersonated faces and the rest of the video and the current video frame.
While we explore spatial inconsistency, we can also explore the temporal inconsistency.
One of the major weaknesses of Deepfakes is that video frames are generated frame-by-frame independently. Such independence may generate video frames with noticeable different tones, lighting, and shadow compared with the last frame. When it is playing back, flicking occurs.
Sometimes, the quality of the replaced frames is so bad that the bad frames are manually or automatically removed. If not too many frames are skipped, you may not notice it without paying too much attention.
We take a couple of snapshots below. Even they are very close in time, the sharpness and tones are noticeably different.
The diagram below shows another two frames with quite different RGB distributions.
If you playback the video below at 0.25 speed, skin shimmering and unnatural tone changes occur when the head is moving.
In Deepfakes, quick movements often make it hard to create frames with proper temporal smoothness. The changes in the latent factors in the neighboring frames may be incorrectly exaggerated by the decoder. This is not easy to solve unless we add an extra term in the cost function to penalize such temporal jiggle during the training.
In Deepfakes, there are areas that you should pay special attention to in spotting fake videos. One is the border area of the face where it merges with the original.
But for more serious productions, the artifacts will be less noticeable or unobservable.