Beating the GAN Game

Original article was published on Artificial Intelligence on Medium

Tanh, not Sigmoid

tanh v sigmoid (logistic) activations

When pre-processing image data, we should normalize it to between -1 and 1 rather than 0 and 1 [4].

This also means that our final layer activation in G should be tanh rather than sigmoid.


Another important thing to note here is that post-generation, don’t forget to multiply values by 127.5 and add 127.5 to return to the original range of 0–255.

image_array = generated_tensor.numpy() * 127.5 + 127.5

Don’t Overdo it with Filters

Initially, when my DCGAN (deep convolutional GAN) was struggling with variance in brightness, I figured the generator’s transpose convolutional layers were simply lacking in complexity.

So, I added more filters. It turns out that this is absolutely the complete opposite of what I should have done.

Reducing the number of filters allowed the generator to represent the range of values much better.

Here are some filter-heavy generated MNIST numbers:

Generator filter dimensions are: 256 > 128 > 64 > 1

Now with minimal filters:

Generator filter dimensions are: 32 > 16 > 8 > 1 (you can go lower too)

Too many filters pushes generator values to their limits. For tanh this is -1 or +1. Resulting in a lack of convincing generated images.

Sparse Gradients are Bad

Sparse gradients are essentially weak signals. Or, signals with a very low value.

The problem with these low value signals is that with the many numerical operations being performed on them, they can get smaller and smaller.

Those of you familiar with RNNs will undoubtedly have come across the vanishing gradient problem. This is exactly the same.

To fix this, we can add Batch Normalization to our network [5].

Important to note, batch normalization should occur after s-shaped activations (tanh, sigmoid/logistic), and before non-Gaussian activations (ReLU, LeakyReLU) [5].

LeakyReLU activation

Additionally, use LeakyReLU throughout for both the Generator and Discriminator*[4] — other than for the final layer, where we use an s-shaped activation function.

*The original DCGAN uses ReLU activations in the generator, and LeakyReLU in the discriminator [3]

Learning Rates

Sometimes we need to find a better balance between the discriminator and generator.

Initially, my discriminator was far too quick to learn. Essentially freezing the generator, which was unable to make any progress.

We can balance this by reducing or increasing the learning rates of both adversaries, allowing our weaker network a little more breathing space.

Start with MNIST

The first thing I tried to do was generate these complex art styles. When it didn’t work, I had no idea why. Were the images too complex? Was I pre-processing them correctly? Maybe the problem was with the network, which could be anywhere.

Because I had built something from scratch, I had no other implementation to compare and benchmark against. In the end, there were several problems throughout the code, but I only identified these by rebuilding it for the MNIST data-set.

This allows you to see where your results and code diverge from that of others, so problems are much easier to diagnose. Once your network is producing reasonable outputs, you can benchmark the quality against other implementations too.

So before jumping into some cool, but complex project, have a go with MNIST (or another established data-set) first.