Source: Deep Learning on Medium
This type of technology led to several interesting prototypes of image editing software for the upcoming future.
Face editing software using SC-FEGAN (paper: SC-FEGAN: Face Editing Generative Adversarial Network with User’s Sketch and Color):
Still waiting for a product using StarGAN!
GANs are already moving to multi-modal cases, for example, generating images from an English sentence:
It’s not yet ready for creating a movie by its description (probably being created by another GAN or other generative model like, say, GPT-5, and we’ll talk on texts later), but the trend is obvious.
Then it started to be applied to videos by motion transfer and face-swapping.
Detecting faces and tracking people’s movements (including body, eyes or lip movements) gave rise to the possibility of transferring some personal traits to other people or generating artificial personas with the desired characteristics.
The year 2018 paper called “Everybody Dance Now” describes video-to-video translation using the pose as an intermediate representation.
Another year 2018 technology by NVIDIA called vid2vid allows creating high-resolution, photorealistic, temporally coherent videos on a diverse set of input formats including segmentation masks, sketches, and poses. Results are pretty impressive:
A very recent September 2019 paper performs human motion imitation, appearance transfer, and novel view synthesis within a uniﬁed framework, which means that the model once being trained can be used to handle all these tasks:
Face-swapping and reenactment
What about face-swapping, face-reenactment or face generation, you surely heard of these two most famous examples: DeepFakes and the fake Obama speech.
The former case of DeepFake has led to a wide ban on “involuntary synthetic pornographic imagery” among online platforms.
The original DeepFake emerged in November 2017. The first version was just a plain dumb convolutional neural network with an autoencoder (no GAN whatsoever). Both architectures were well known and were successfully used for many years. It’s strange we saw it only a couple of years ago because the technology was ready for a long time. DeepFakes with GANs came later.
Nearly a half year later Deep Video Portraits was presented at SIGGRAPH 2018. It enabled photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, it was the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.
The technology for both cases is constantly upgrading. Now it is possible to change the speech using just text or to create realistic photos (or video) using a single image.
Deepfake artist Hao Li, who created a Putin deepfake for at MIT Technology Review’s EmTech conference, told in September 2019 that “perfectly real” manipulated videos are just six to 12 months away from being accessible to everyday people.
A recently published August 2019 paper on FSGAN (Subject Agnostic Face Swapping and Reenactment) produces very compelling results for face swapping and reenactment in videos:
Right now ZAO is a Chinese popular face-swapping application that’s able to place you into scenes from movies and TV shows after uploading just a single photograph:
At the end of 2018, Xinhua presented the first AI anchor at the ongoing fifth World Internet Conference in east China’s Zhejiang Province. The AI news anchor was jointly developed by Xinhua News Agency, the official state-run media outlet of China, and the Chinese search engine company Sogou.com.
The news anchor, based on the latest AI technology, has a male image with a voice, facial expressions, and actions of a real person. “He” learns from live broadcasting videos by himself and can read texts as naturally as a professional news anchor.