Source: Deep Learning on Medium
Generative Adversarial Networks is the most interesting idea in the last 10 years in Machine Learning.
— Yann LeCun, Director of AI Research at Facebook AI
There’s no doubt, GANs are very exciting now, they are the first generative algorithms to give convincingly good results, and probably, these results will become only better with time. So, it is quite interesting to try yourself in this field full of opportunities and constant advancements, but… nothing is so simple.
Since the field is very young, starting with the first paper in 2014, and because of the vast number of papers and applications, it can be very challenging to get started with GANs. So, what is the best path for learning it? Today, I’m going to shed some light on it!
In this post, I will cover the following points:
- What are Neural Networks?
- What are GANs?
- How do GANs work?
- Applications of GANs
- The best stuff to study GANs
Without further ado, let’s jump right in!
What are Neural Networks?
When it comes to talking about GANs, the first association is deepfakes — photos and videos in which people’s faces (most often celebrities) are placed on other people’s bodies. Just like ‘The Shining’ deepfake has gone viral with Jim Carrey Starring as Jack Torrance. How does it work? The answer is simple — the strength of neural networks, that’s why we are starting our learning journey with this topic.
So what is a neural network? Simply put, this is a program that performs a certain kind of tasks. Why such a sonorous name? That’s because it resembles the work of neurons in the human brain. But while regular programs have all the settings set by a person, neural networks are able to act autonomously. You just need to train them to specify tens of millions of parameters.
By the way, if you ever had a chance to choose all the images that have road signs, congratulations: you had a hand in the training neural networks and artificial intelligence. Google’s “CAPTCHA” is designed not only to calculate robots (and make us doubt our own humanity). Each time a user marks objects in a photo, Google’s neural networks become a little smarter, image search becomes more precise, Google Photos becomes better in recognizing faces, places, and objects in your photo library, and Waymo drones start to kill people and cats less than usual.
So, converting one image to another is a standard operation for any neural network. When networks perform the tasks of recognizing faces, objects, or sounds, at the output they give a vector or a numerical value, which can belong to one or another class of objects. But nothing prevents setting up the neural network so that as a result we get not a number, but a set of pixels. Another question: how to make this set look like a real photo or picture, and not like a meaningless abstraction?
And for this, it is the right time when generative adversarial networks (GANs) enter the game.
What Are Generative Adversarial Networks?
Generative adversarial networks are a class of neural networks that are used in unsupervised machine learning. It was invented by Ian Goodfellow and his colleagues in 2014.
At its core is a combination of two neural networks. Simply put, one of them is trained to create images with some object, another one — to distinguish what image is real and what is artificially created. The first network tries to trick the second one, and with each failed attempt, receiving feedback from it, it starts to cope with this task better. One of them generates, and the other criticizes. Together in perfect collaboration, they give excellent results in the synthesis of plausible images and image enhancement.
https://thispersondoesnotexist.com/ presents a random, computer-generated photo of a fictional person. Refresh the page each time for a new face.
The name of the first algorithm or network is “generator” and the second one is “discriminator”. So, they work simultaneously. The generator’s task is to generate images of a given category. The task of the discriminator is to try to recognize the created image.
For example, we have pictures that are similar to faces. The discriminator, in that case, is trying to determine whether it is a face or not. And over time, this network learns to generate realistic faces.
How Do GANs Work?
To understand more how does GANs wor, let’s delve more into understanding what does discriminator and generator do.
“Discriminator” or “discriminative network”. For recognition, we use Convolutional neural networks (CNNs). What it is? Let me clear it up: CNN can recognize images in pictures, for example, extract faces, numbers, etc. from the entire image. To make neural network recognize something, you need to train it to process a large number of images that contain the desired images.
You give the neural network a large number of pictures of cats and mark those parts of the image where the cats are. After this, neural network is able to recognize two types of pictures: with cats, and without.
“Generator” or “generative network” model. Image formation begins with the generation of arbitrary noise, on which fragments of the desired image begin to appear. Imagine that you are shaking a plate of sand until you manage to “shake” something that is vaguely reminiscent of a number. And then continue to shake until the contours of the numbers become more pronounced. The neural network remembers exactly how you shook the plate to achieve such a result, and next time it reproduces your actions.
Naturally, this is an illustrative example, and more reasonable approaches are used in real models. As generating neural networks can be used (and are often used) FFNN networks — feed forward neural networks.
So, how to make a neural network generate Jim Carrey’s face?
For this, we need not ordinary generative-adversarial networks, but the so-called conditional GANs. In order to get believable photographs of Jim Carrey (or any other person), a neural network is first fed by the photographs of a certain person, then it is fed by images by the actor’s face, which is almost impossible to distinguish from real ones.
In order for the face to speak, twist and blink, as in the well-known viral video, the neural network is equipped with photographs of people with different facial expressions — thank God, we all grimace in a rather similar way.
What are some practical applications of generative adversarial networks?
Well, there are lots of them. We make impressive progress in the first few years of GAN developments. No more stamp-size facial pictures like those in horror movies. In 2017, GAN produced 1024 × 1024 images that can fool a talent scout. GANs have been used for image generation, inpainting, photosynthesis from the text, image editing and many other applications, often leading to state of the art results.
Here is a brief overview of some practical applications of generative adversarial networks.
Content and data creation: creating pictures for an online store, avatars for games, videos generated automatically based on the music snippets, or even virtual hosts for TV programs. Thanks to the work of GAN, data synthesis occurs, and other systems can be trained on it.
Automatic editing: this approach is already in use on modern smartphones and some apps. It allows you to change facial expressions, the number of wrinkles and change the color of your hair, change day to night light and more.
GAN Neural Network Applications — Real Inspirational Examples
Generate stunningly realistic images of “celebrities” (and actually non-existent people) using Nvidia’s PG GAN. Using this network, you can also generate images of any other categories.
All these images are generated by a system based on generative adversarial neural networks, some of them do not look too realistic, but others are very believable.
The Everybody Dance Now model, created by a team of scientists from the University of Berkeley, presents a simple method based on generative neural networks of motion transmission “do as I do.” Given: The original video of a dancing man. Result: We make the image of another person dance. This is called “motion transfer.”
The transfer of style from one image to another allows using neural networks to do such impressive things as “turn a horse into a zebra”
Or generate “anime portraits” from a photograph. In this picture you can see how different types of GANs cope with this task.
Changing the emotions, age, facial expressions of a person — all this can be achieved by properly training and programming the GAN neural networks. In practice, it looks like this: at the model’s input, the original photo is submitted and the emotion that needs to be shown at the output is indicated.
GAN neural networks are also used to generate realistic video of the urban environment. For example, when creating films, games, virtual reality.
Converting sketches and contour drawings into a photorealistic image using the GAN is as follows: you draw a face, a bag, or, for example, a cat, by hand and get a photorealistic image at the output. You can try it here.
Best Resources for Generative Networks
Delving deeper is worth it, but for acquiring valid knowledge, you need to choose the right sources for this. Here is my selection of best videos, books, articles that are definitely suitable for this purpose.
Video lectures and presentations
#1 Ian Goodfellow: Generative Adversarial Networks (NIPS 2016 tutorial)
Accessible to an audience who has no experience with GANs, it will prepare you to make original research contributions applying GANs or improving the core GAN algorithms. Topics include a review of work applying GANs to large image generation; improved model architectures that yield better learning in GANs; semi-supervised learning with GANs and more.
Here are accompanying slides and paper version of the tutorial:
#2 AAAI-19 Invited Talk — Ian Goodfellow (Google AI) — Adversarial Machine Learning
A more recent presentation on the broader topic of Adversarial Machine Learning, that also covers GANs
#3 Lecture on Generative Models from the Stanford course on Convolutional Neural Networks
This lecture provides a useful context for GANs as well as coverage of the related techniques of Variational Autoencoders and PixelRNN.
Possibly the most powerful starting point is the Deep Learning textbook written by Goodfellow. Chapter 20 is titled “Deep Generative Models” and gives a useful summary of a range of techniques, including GANs.
#2 Generative Adversarial Networks Cookbook by Josh Kalin, 2019
This book leads you through eight different examples of modern GAN implementation, including CycleGAN, simGAN, DCGAN, and Imitation Learning with GANs. Each chapter builds on a common architecture in Python and Keras to explore increasingly difficult GAN architectures in an easy-to-read format.
#3 Generative Deep Learning by David Foster, 2019
You will learn how to recreate some of the most famous examples of generative deep learning models, such as variational autoencoders and generative adversarial networks (GANs). You’ll also learn how to apply the techniques to your own datasets.
#4 GANs in Action by Jakub Langr, Vladimir Bok, 2019
First, you’ll get an introduction to generative modeling and how GANs work, along with an overview of their potential uses. Then, you’ll start building your own simple adversarial system, as you explore the foundation of GAN architecture: the generator and discriminator networks.
#5 Generative Adversarial Networks Projects by Kailash Ahirwar, 2019
This book summarizes a range of GANs with code examples in Keras.
Slides and tutorials
- Ian Goodfellow’s GAN Slides (NIPS Goodfellow Slides) — nice and brief explanation for beginners
- ICCV 2017 Tutorial About GANS — other great slides for beginners from ICCV tutorial held in Italy in 2017. It is starting with introductory topics and continuing with an exploration of state of the art GAN models, models using adversarial loss and more.
More relevant links
Books and Guides to master Deep Learning
Thanks for reading!