How to fight mode collapse in GANs

Source: Deep Learning on Medium

How to Fight Mode Collapse in GANs


What is all this fuss about Generative Adversarial Networks (GANs)?!, what did this new “invention” really accomplish? which challenges did it solve and what are the current limitations is it facing? We will answer these questions simply and concisely through this article.

GAN is relatively a new machine learning technique invented by Ian Goodfellow in 2014. In GANs, instead of one neural network, two new neural networks compete against each other in a two-sided game. During the game, based on the training data, one neural networks (known as the Generator), utilizes the knowledge on the training data distribution to try to generate fake samples of the data, while the other neural network (known as the Discriminator) is fed both the fake sample as well as real samples and the network tries to classify the incoming data correctly whether it is real or fake. As the game progresses, both networks become better at their tasks, the generator becomes better at generating fake data that looks real, while the discriminator becomes better at telling the difference between a fake or a real sample of data. In the end, the new data produced by the generator looks authentic even to human observers.

There is a famous example mentioned to explain the function of the Generator and Discriminator: the counterfeiters and the police, the counterfeiters try to fake money in a way that is not differentiated from the real money. At the same time, when policemen succeed to detect counterfeits money, that encourages the counterfeiters to even become better at creating counterfeit money that cannot be detected, and the chase goes on! In this game, however, we hope that the policemen win :D.



GANs have been responsible for remarkable achievements in generating real samples of data that have not been seen before in several applications varying from generating images of real people that do not exist (check paper here), through generating text (check paper here), to generating fake audio that is indistinguishable from real audio (check paper here), even generating authentic music! (check paper here). However, there are still some limitations that hinder GANs development. Such limitations include: i) model training instability, ii) hard to evaluate their performance, and iii) they suffer from mode collapse problem. The purpose of this article is to some recent research that proposed a multi-generator setup to mitigate some of GANs limitations such as the mode collapse.

Mode collapse

The mode collapse in GANs refers to the problem of missing some of the modes of the multi-modal data it was trained on. Simply speaking, in case of a GAN trained on a dataset consisting of digits from 0 to 9, the generated images from a GAN suffering from mode collapse would not generate some of the digits (ex. Generate only from 0 to 7 or all except digit 5…etc.). The below image shows an example of training two GANs, the first row showing a normal GANs learning to successfully generate 10 modes (10 digits), while the second row is showing a GAN suffering from mode collapse, only generating one mode.


Mode collapse examples are not only limited to that, in an example of animal classes (cat or dog), the generator can learn to generate both images of cats and dogs, cats with different colors and features but only limited colors and features of a dog (ex. White poodle dog).

Why does mode collapse occur? there are different hypotheses presented in the literature, yet our understanding of it is still lacking. However, an obvious understanding of what is happening during mode collapse, that the generator fails to model the distribution of the training data well enough.

Solutions from the literature

So, after we explained mode collapse, it is time to present some techniques to mitigate this problem. One of the most promising directions in the last couple of years has been using multiple generators. Hence, we are going to talk about this direction in detail.

AdaGAN: Inspired by boosting techniques, Tolstikhin et al. (2017) elevated that technique to train a collection of generators instead of one generator. Also, the training is done sequentially by adding a new generator to the mixture model. A classifier is trained to separate the original images from fake images generated from the mixture model, the classifier weights then are used to reweight the training set. The reweighting is mainly done to ignore images of modes that it is confident about (already generated images of that mode/class), and a new GAN is hence defined after this reweighting, making it less probable for this GAN to miss the modes that were missed before from the mixture model.

This work, however, has two main limitations:

– Computationally expensive (multiple GANs).

– Built on an assumption that a single GAN generator can generate good enough images of some classes, which can be untrue for competitive and diverse datasets such as ImageNet where some GANs tend to generate unidentifiable objects.

MAD-GAN: Ghosh et al. (2018) proposed a GAN setup of multiple generators and one discriminator, in addition to that, they changed the normal setup of the discriminator such that the discriminator is not only required to detect if the provided sample is real or fake but additionally to detect which generator was responsible for creating the fake sample (had it decided that the sample is fake). Indirectly, the only way for the discriminator to be able to tell which generator has actually generated a particular sample is when there is a recognizable difference between different samples. Hence, such setup encourages different generators to synthesize identifiably different samples, which pushes for diversified modes over different generators. Check the below illustrative figure, where each row corresponds to images created from different generators, we can see clearly that MAD-GANs were able to help different generators capture different modes.