Original article was published by Abhishek Suran on Deep Learning on Medium

# WGAN

As we discussed above, WGAN makes use of Wasserstein loss, so let us understand the objective of both generator and discriminator.

## Wasserstein Loss { E(d(R)) — E(d(F)) }

- So our loss is the difference b/w expected value of discriminator’s output to real images and the expected value of discriminator’s output to fake images that were generated.
- The discriminator’s objective is to maximize this difference while the generator’s goal is to minimize this difference.
*Note**:**To use Wasserstein loss, our discriminator needs to be 1-L (1-Lipschitz) continuous i.e norm of gradient must be at most 1 on every point.*

Let us see a way to enforce 1-L continuity.

## Gradient Penalty

Gradient Penalty is being used to enforce 1-L continuity and is added to loss as regularization of discriminator gradient. The following are the steps to calculate the gradient penalty.

- Compute an interpolated image from the real and fake image by
**(real_image * epsilon + fake_image * (1 — epsilon))**. - Then, calculate the gradient of the discriminator’s output with respect to the interpolated image. After that, calculate the norm of the gradient.
- Then, the penalty is calculated as a
**mean of the square of ( norm — 1)**as we want the norm to be close to one.

## Generator’s and Discriminator’s Objectives

So, the final objective for the Generator is to increase the mean of discriminator’s fake output. And discriminator’s goal is w-loss with a weighted penalty.

## Training WGAN

While training WGAN, the discriminator is trained multiple times for each step, while the generator is trained once per step.

## Conclusion

WGAN result in stable training and solves the problem of mode collapse and vanishing gradient which we face in DCGAN.

But all this comes with a price i.e WGAN is slow in training.