Original article was published on Artificial Intelligence on Medium
GANs can be very helpful and pretty disruptive in some areas of application, but, as in everything, it’s a trade-off between their benefits and the challenges that we easily find while working with them. We can break down GANs challenges in 3 main problems:
- Mode collapse
- Non-convergence and instability
- Highly sensibility to hyperparameters and evaluation metrics
Why mode collapse? 🔥
GANs can sometimes suffer from the limitation of generating samples with little representative of the population, which means that, for example, after training a GAN on the MNIST dataset, it may happen that our Generator is unable to generate digits different from digit 0. This condition is called mode collapse.
The main drawback is related to the fact that GANs are to able to focus on the whole data distribution due to its objective function. Some experiments have shown that even for bi-modal distribution, GANs tend to produce a good fit to the principal mode, struggling to generalize. In summary, mode collapse is a consequence of poor generalization and can be classified into two different types:
- Most of the modes from the input data are absent from the generated data
- Only a subset of particular modes is learned by the Generator.
The causes for mode collapse can vary, from an ill-suited objective function to the impact of the chosen GAN architecture having in consideration the data under analysis. But fear no more, there are options to solve this many have been the efforts dedicated to this particular challenge.
Non-convergence and instability
The fact that GANs are composed by two networks, and each one of them has its loss function, results in the fact that GANs are inherently unstable- diving a bit deeper into the problem, the Generator (G) loss can lead to the GAN instability, which can be the cause of the gradient vanishing problem when the Discriminator (D) can easily distinguish between real and fake samples.
In GANs architecture, the D tries to minimize a cross-entropy while the G tries to maximize it. When D confidence is high and starts to reject the samples that are produced by G leads to G’s gradient vanishes.
This might refer to the hypothesis of the existence of local equilibria in the non-convex game that we are targeting when training GANs, as proposed in an article about GANs convergence and stability. There are some options already proposed in the literature to mitigate this problem, such as reversing the target employed for construction the cross-entropy cost or the application of gradient penalty to avoid local equilibria.
What about hyperparameters and evaluation?
No cost function will work without the selection of good hyperparameters, and GANs are not an exception they are even more sensitive to the selection of the network hyperparameters. The right selection of hyperparameters can be tedious and time-consuming, and so far the majority of the efforts have been in topics such as mode-collapse or GAN’s struggles to converge.
No cost function will work without the selection of good hyperparameters!
Moreover, GANs lack meaningful measures to evaluate the quality of their output. Since its creation, GANs have been widely used with a variety of application areas, from supervised representation learning, semi-supervised learning, inpainting, denoising, and synthetic data creation. The extensive applications brings along a lot of heterogeneity, which makes it harder to define how we can evaluate the equality of these networks. Because there are no robust or consistent metrics defined, in particular for image generation, it is difficult to evaluate which GANs algorithms outperform others. A series of evaluation methods have been proposed in the literature, to overcome this challenge — you can find interesting details about GANs evaluation metrics in this article.