Deepfake: The Good, The Bad and the Ugly

Source: Deep Learning on Medium


Go to the profile of Nahua Kang

This blog post is featured in the newsletter Embodied AI, your bi-weekly digest on the latest news, tech, and trends for AI avatars, virtual beings, and digital humans. 👉 Subscribe and get ahead on AI avatar news!


In a scene from Steven Spielberg’s Catch Me If You Can, FBI agent Carl Hanratty tracks down the fraudster Frank Abagnale Jr. and breaks into his hotel room. Despite a gun’s barrel in his face, Abagnale masquerades as a fellow law enforcement agent who is also after the conman, fooling Hanratty and slipping through his fingers. Navigating between real and fake, the movie is about a good cop catching an ingenious kid on a bad path. Now, the same story is being played out in the media, manipulating and attacking our ability to tell truth from lies, real from fake, and righteousness from cynicism. Meet the digital conman in question: the deepfake.

Frank Abagnale Jr. (Leonardo DiCaprio) pretending to be a secret service agent (Credit: Catch Me If You Can)

A GAN of Fakes

Deepfake is the combination of deep learning and fake — a portmanteau, if you will. You might have already visited the site This Person Does Not Exist, which utilizes AI to generate profile pictures of fake humans. The secret sauce in making these uncanny, fake images is the Generative Adversarial Network (GAN) algorithm, which has its own Abagnale and Hanratty.

How Generative Adversarial Network (GAN) works: A Catch-Me-If-You-Can illustration

A GAN consists of a generator and a discriminator. Imagine we have a GAN that’s trained to create fake images of pilots. The generator, our Abagnale, attempts to forge fake pilot pictures. The discriminator — our Hanratty — looks at both the authentic images and the forged ones created by the generator, training itself to tell the real from the fake. As the GAN model continues training, both the generator and the discriminator get progressively better at their jobs, elevating each other to whole new levels of forgery and detection. Deepfakes are the forgeries of the generator that dodge the scrutiny of the discriminator. A true fake it until you make it story!

The Bad: Fraud, Humiliation and Misinformation

Alarmingly, deepfake can now create fake videos. Deepfake videos are still in their early days and must be edited on an image-by-image basis, generating only aesthetic changes without creating original motions. Already in 2017, though, an anonymous redditor under the pseudonym, “deepfakes”, overlayed Gal Gadot’s face onto a porn video. Another redditor created an app, FakeApp, to streamline the process of faking videos. Now, the ability to precisely overlay the face of a person on a video is easier than ever before.

Filmmaker Jordan Peele deepfaked an Obama speech to warn the threats of deepfakes (Link)

With 93 million selfies taken everyday, one can only imagine how easy it could be to abuse deepfakes for cynical, even criminal uses: revenge porn, blackmail, identity theft, or exacerbating misinformation are examples we can imagine now. In theory, everyone who posts photos on social media are vulnerable. Videos such as the fake Obama speech can be generated at scale, tricking those who are susceptible to misinformation and further dividing today’s polarized society. The implication of an unregulated and unfiltered internet and social media full of deepfakes can harm our private lives and society at large.

The Ugly: A Downward Spiraling Race

Like any deep learning task, data collection is crucial for realistic deepfakes. Kevin Roose, a tech columnist at NYTimes, tried the FakeApp by collecting video footage of himself with entertaining if unconvincing results. For now, it implies that only public figures with plenty of high-quality videos will be the targets of convincing deepfakes. But deepfakes of politicians are potentially the most impactful forms of fake news and could negatively affect, for example, the 2020 US elections.

This development also places a heavier weight on technology that detects deepfake content. That’s why we have curated 6 tools that can help you detect deepfakes:

But deepfakes are like computer viruses: as soon as someone finds a way to detect them, another will find a way around it. The hide and seek between fraudsters and detectors spiral together downward as if the two constitute a gigantic GAN.

The Good: Humanlike AI for a new form of communication

Despite its dark side, deepfake also has positive uses that can help society, such as enabling new forms of communication. Take voice generation: Google Assistant can now speak like John Legend by utilizing Wavenet, a generative model for speech generation. Startups Lyrebird and Modulate can learn to talk like you with a few hours of speaking audio. Bernard Marr reports that Baidu’s technology takes only 3.7 seconds to clone a voice. Soon we’ll have smart speakers that talk like our favorite singers or our own virtual selves who represent us when we are out of office.

In video generation, the startup DataGrid has generated entire bodies of humans who don’t exist and London-based Synthesia generated a David Beckham deepfake for a malaria campaign. Researchers at UC Berkeley have created deepfakes for dancing. Most recently, the Dalí Museum in Florida leveraged deepfake to resurrect the Spanish surrealist in a kiosk where visitors can interact with him.

Full video here. (Credit: The Dalí Museum)

Imagine resurrecting other famous historical figures who have inspired us or left positive impacts on society. Imagine resurrecting our loved ones who passed away. And, for Harry Potter fans, imagine making those living Hogwarts Portraits come true!

TwentyBN’s CEO and founder Roland Memisevic believes that deepfake is a crucial building block in creating humanlike AI agents

The progress on these frontiers reveals the potential of deepfake as a crucial building block in creating humanlike AI characters on-screen. Roland Memisevic, CEO & founder of TwentyBN, believes that by combining deepfake technique, video understanding (with TwentyBN’s Something-Something dataset and HyperModel for action recognition), dialogue systems like Google Duplex, and language models like OpenAI GPT-2, we will be able to build realistic digital humans who can conduct face-to-face conversations that are indistinguishable from a natural human conversation.

La Fin

The real Abagnale, like in the film, was released from prison and served over four decades at the FBI, working on counterfeits and forgeries, embezzlement and financial crimes, and cybersecurity crimes. “Technology breeds crime. It always has, it always will,” he said in a talk at Google. But according to him, each breech is caused not by hackers but people who either did something they weren’t supposed to do or failed to do something they were supposed to do.

The same learning applies to deepfake: No matter how realistic deepfakes become or how accurate anti-deepfake technology evolves, the real damage caused by deepfakes are done by humans who create, falsely believe, and spread deepfakes. Instead of blaming technology, we must find creative solutions to help people think more critically about information they digest online and learn to share more smartly on social media. While being conscious of deepfake’s negative impacts, we should look to the better angels of AI and harness deepfake’s potential to create new forms of communication that will improve our lives.