Deep Convolutional Vs Wasserstein Generative Adversarial Network

Original article was published by Abhishek Suran on Deep Learning on Medium


As we discussed above, WGAN makes use of Wasserstein loss, so let us understand the objective of both generator and discriminator.

Wasserstein Loss { E(d(R)) — E(d(F)) }

  1. So our loss is the difference b/w expected value of discriminator’s output to real images and the expected value of discriminator’s output to fake images that were generated.
  2. The discriminator’s objective is to maximize this difference while the generator’s goal is to minimize this difference.
  3. Note: To use Wasserstein loss, our discriminator needs to be 1-L (1-Lipschitz) continuous i.e norm of gradient must be at most 1 on every point.

Let us see a way to enforce 1-L continuity.

Gradient Penalty

Gradient Penalty is being used to enforce 1-L continuity and is added to loss as regularization of discriminator gradient. The following are the steps to calculate the gradient penalty.

  1. Compute an interpolated image from the real and fake image by (real_image * epsilon + fake_image * (1 — epsilon)).
  2. Then, calculate the gradient of the discriminator’s output with respect to the interpolated image. After that, calculate the norm of the gradient.
  3. Then, the penalty is calculated as a mean of the square of ( norm — 1) as we want the norm to be close to one.

Generator’s and Discriminator’s Objectives

So, the final objective for the Generator is to increase the mean of discriminator’s fake output. And discriminator’s goal is w-loss with a weighted penalty.

Training WGAN

While training WGAN, the discriminator is trained multiple times for each step, while the generator is trained once per step.


WGAN result in stable training and solves the problem of mode collapse and vanishing gradient which we face in DCGAN.

But all this comes with a price i.e WGAN is slow in training.