GAN — How to measure GAN performance?


Photo by Dawid Małecki

Inception Score (IS)

IS uses two criteria in measuring the performance of GAN:

  • The quality of the generated images, and
  • their diversity.

Entropy can be viewed as randomness. If the value of a random variable x is highly predictable, it has low entropy. On the contrary, if it is highly unpredictable, the entropy is high. For example, in the figure below, we have two probability distributions p(x) and p1 has a lower entropy than p2.

In GAN, we want the conditional probability P(y|x) to be highly predictable (low entropy). Given an image, we should know the object type easily. So we use an Inception network to classify the generated images and predict P(y|x). This measures the quality of the images.

P(y) is the marginal probability

If the generated images are diverse, the data distribution for y should be uniform. The figure below visualizes this concept.

To combine these two criteria, we compute their KL-divergence and use the equation below to compute IS.

Frechet Inception Distance (FID)

In FID, we use the Inception network to extract features from a specify layer. Then we model the data distribution for these features as a multi-variate Gaussian distribution with mean µ and covariance Σ. The FID between the real images x and generated images g is:

where Tr sums up all the diagonal elements.

FID is more robust to noise than IS and if the model only generate one image per class, it will have a high IS but not for FID.

Reference

Improved Techniques for Training GANs

Are GANs Created Equal? A Large-Scale Study

Source: Deep Learning on Medium