Making sense of Generative Adversarial Networks(GAN)

Original article was published by Atul Kumar on Deep Learning on Medium


Making sense of Generative Adversarial Networks(GAN)

Why we need Generative Adversarial Network: –

If we show a lot and lots of pictures of a person and car to a neural network and tell the network which one is a car and which one is a person’s picture then eventually this network learns to differentiate between a person and a car. So, when you feed a new picture of a car or a person it will tell whether it’s a person or a car as shown in Figure 1.1. Basically, what this network does is that it constructs a structure that is meaningful if we look at it.

But if you tell this network to generate a new unseen picture of a person or a car then it won’t be able to do that as shown in Figure 1.2.

Convolution Neural Networks

Most often we need to generate new samples of the same Input distribution and for that, we need a generative model

Generative Network: –

Input Data To Generative Network

If we feed these three types of data to a Generative neural network then the network learned model will look like Figure 3. When we try to generate a sample form this trained generative neural network then it will generate Figure 4 since this is similar to the average of all three-input distribution. But by looking at it we say no this sample does not belong to any of the input data distribution. So, how to solve this problem. And the answer is randomness. So, the Generative models add randomness to generate indistinguishable results.

Figure3: Learned model, Figure 4: the output of the generative neural network

Adversarial Network: –

Suppose we want to train a network to correctly identify the digits from 0 to 9. We feed lots and lots of images of numbers 0,1, 2, and so on. While the training network gets rewarded for the right prediction and the wrong prediction network gives feedback and the network will adjust its weights accordingly and this process will get repeated over and over for all the images of all digits. But we as a human does not behave like this.

For example — Suppose as a teacher and you are teaching a child how to recognize 0–9 digits. For numbers 0, 2, 3, 4, 5, 6, 8, 9 he can get the right answer 70% time. But when he gets 1 and 7 digits, he is like 50–50(he is not able to tell). Because digits 1 and 7 looks similar to him. So, when you notice this then you start focusing on 1 and 7 where he faces most of the problem. And this kind of balance possible in humans because if you keep asking the same thing where he is failing then eventually, he will get demotivated and will give up, but that’s not the case with neural networks because NN doesn’t have feelings. We can train the network for these errors again and again until the error come down as to other numbers.

This is true that some people may have faced a situation where their teachers ask the same thing to their student where he keeps failing and which makes him feel that his teacher wants to fail him. And that’s behavior is actually adverse behavior.
How to make a similar scenario in neural networks? Actually, we can make an actual adverse network.
If there is some process which genuinely doing its best to make neural network to give us high error as much as possible and produce this effect and if it spots any weakness then it focuses on that therefore it forces the learner to learn not to have that weakness anymore.

GAN(Generative Adversarial Network: –

GAN is consisting of two models / One is Discriminator an another is Generator. While training GAN both these networks literally compete with each other. Both these networks compete for the only parameter and that is the discriminator error rate. The generator adjusts its weight to produce a higher error and the Discriminator learns to try to lower the error. Let’s understand with the help of an example.

Figure 4:- GAN training process

Example: –

There is a forger who tries to make a fake painting to sell at a higher price. And there is one inspector to check the paintings to tell if a painting is fake or real.

So, initially, forger just draws some random lines on the paper and the inspector was like I hmm I am not sure. Because initially both Generator and Discrimination don’t have any learning.

Later forger painter learns more about different kinds of paint to make a painting that looks like the original painting and the inspector learns to the fined pattern to differentiate between fake and original painting. When Inspector looks at forgers newly generated painting then the inspector rejects it as it’s a fake and this process goes on. Eventually, a situation arises where Forger will make a picture that looks original, and Inspector tells that I am not sure whether is real or fake. In a neural network, terms Generate to produce a painting which looks like exactly original and Discriminator gives 0.5 as output which means discriminator is not able to differentiate between real and fake picture. At this point, we can chop off the Discriminator from the network to have a fully trained generator that can generate paintings that looks real.
There is more to this. If we feed lots of images of the car to the GAN network to generate a new car sample then one thing is sure that now GAN understands what is a car. Because the network will construct a structure which is also called a feature vector in latent space and these vectors are meaning full if we look at it. This latent space is a mapping to the input data distribution. And each dimension has meaning to particular features of the car. For example, if one axis in latent space belongs to the size of the car and another axis belongs to the color of the car.
So, if we move a data point in input distribution then there will be a very smooth transition in latent space also. And this causes the generation of a new unseen sample which is similar to input data distribution.