Learning Generative Adversarial Networks (GANs)
GANs were introduced in a paper by Ian Goodfellow and other researchers at the University of Montreal in 2014.
What is GAN?
A generative adversarial network (GAN) is a type of model in neural network that offer a lots of potential in the world of machine learning. In GAN there are two neural networks: first is generative network and second is discriminative network. So the main concept behind this project is the generative adversarial network. GAN is about creating stuff and this is hard to compare other deep leaning fields. The main focus of GAN is to generate data from scratch. As we see early GAN composes of two networks the generator and the discriminator.
Generative adversarial networks (GANs) are deep neural net architectures included of two nets, pitting one against the other.
Facebook’s AI research director Yann LeCun called adversarial training “the most interesting idea in the last 10 years in ML.”
GANs’ potential is very huge because they can learn to mimic any data. So use of GAN we create worlds similar to our own in any domain: image, anime, news anchor, speech.
Generative vs. Discriminative Algorithms
To understand GANs, we need to know how generative algorithms work as well as how discriminative algorithms are sententious, so the work of discriminative algorithms tries to classify the data.
A standard example of this scenario is email, given all words in an email what discriminator do is predict whether the message is spam or not spam. In this example, spam is one of the labels and words of the email are features that compose the input data. If we expressed this problem is mathematical, so the label is called y and the feature is called x. The formulation p (y|x) is used to mean “the probability of y given x”.
The main question a generative algorithm tries to answer is: they assuming this email is spam while the discriminative model cares about the relations between feature(x) and label(y).
So if we think about generative algorithms is that they do opposite compare to the discriminator. Instead of predicting a label they predict the features given a certain label.
Best way to distinguish generative from the discriminative like this:
- Discriminative models learn the range between classes
- Generative models model the distribution of individual classes
How GANs Work
As we know these algorithms belong to the field of unsupervised learning
Generative Adversarial Networks are composed of two models:
The first model is called a Generator and its target to generate new data similar to the real one. Generator can create data and discriminator is checked whether the data is real or fake.
And the second model is called a Discriminator. This model’s goal is to recognize if an input data is real or fake — belongs to the original dataset- or if it’s fake generated by the generator. So discriminator is like a police which tries to detect work is real or fake.
When training begins, the generator produces fake data, and the discriminator quickly learns to tell that it’s fake.
After training, the generative model can then be used to create new plausible samples on demand.
GANs have very specific use cases and it can be difficult to understand these use cases when getting started.
How do these models interact?
In the original paper which proposed this framework, it can be thought of the Generator as having an adversary, the Discriminator. So that means generator needs to learn how to do operations as well as create data in such a way discriminator isn’t able to distinguish between the real and fake or it as fake anymore. The competition between these two models is what improves their knowledge until the generator is creating realistic data
Fundamental step to Train a GAN
I. Sample a noise set data and a real data set. Each with size m.
II. Train the Discriminator on this data.
III. Sample a different noise subset with size m.
IV. Train the generator on this data.
V. Repeat from step 1.
Here are the steps a GAN takes:
- So the generator takes in random number and retunes an image.
- The generated image is compared to the actual dataset and fed into the discriminator.
- Then discriminator takes both real and fake images and return probabilities and numbers between 0 and 1 like logistic regression. 1 is representing real and 0 is representing fake.
So we have two feedback loop:
- The discriminator is in a feedback loop with the ground truth of the images.
- The generator is in a feedback loop with the discriminator.
Here’s a picture of the whole system:
As we see both are dynamic. Discriminator network is a standard convolutional network that can categorize the image fed to it, and the generator is inverse convolutional network.
GAN is defined as a minimax game with the following objective function.
Many GAN Models suffer a huge problem:
I. Sometime the model can’t combine Non-convergence.
II. The generator collapses that’s why produce limited verities of sample data III. Unbalance between discriminator and generator causing overfitting data IV. Highly sensitive to the hyperparameter selections.
Application of GAN
The example below demonstrates four image translation cases:
- Translation from photograph to artistic painting style.
- Translation of horse to zebra.
- Translation of photograph from summer to winter.
- Translation of satellite photographs to Google Maps view.
- Text to Image translation ( Han Zhang, et al. in their 2016 paper titled “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks” demonstrate the use of GANs, specifically their StackGAN to generate realistic-looking photographs from textual descriptions of simple objects like birds and flowers.)
- Semantic Image to Photo Translation ( Ting-Chun Wang, et al. in their 2017 paper titled “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs” demonstrate the use of conditional GANs to generate photorealistic images given a semantic image or sketch as input.)
- Photos to Emoji ( Yaniv Taigman, et al. in their 2016 paper titled “Unsupervised Cross-Domain Image Generation” used a GAN to translate images from one domain to another, including from street numbers to MNIST handwritten digits, and from photographs of celebrities to what they call emojis or small cartoon faces.)
- Super Resolution ( Christian Ledig, et al. in their 2016 paper titled “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” demonstrates the use of GANs, specifically their SRGAN model, to generate output images with higher, sometimes much higher, pixel resolution.)
- Video Prediction ( Carl Vondrick, et al. in their 2016 paper titled “Generating Videos with Scene Dynamics” describe the use of GANs for video prediction, specifically predicting up to a second of video frames with success, mainly for static elements of the scene.)
In this article, you discovered a gentle introduction of generative adversarial networks. like what is GAN, Generative and discriminative algorithms, How GAN works, model interaction, Fundamental Steps, GAN problems, and Application?
In my next article, I briefly describe the mathematical way to define GAN and some steps to improve the efficiency of the model and Cycle Consistency of adversarial network for Image to Image translation.
Thanks for reading!
Make sure to like/share this post 😊
Feel free to message me.
Thanks to Sarfaraz Jarda who helped to reviewed the article!
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.