This is my note on this paper, written by Chongxuan Li, Jun Zhu, and Bo Zhang, which were accepted in NIPS 2017 Conference. The basic idea is super simple. To generate class-conditional image generation and classification in SSL, adding new player called “classifier” is introduced. It improves and clarifies the roles of current 2 players: “generator” and “discriminator” in GANs model. This paper exercised this implementation so beautifully and nicely as a name of “triple GANs”.
This paper mainly consists of 4 parts. I’ll explain it in the same organization. First of all, GANs model and its problems are introduced. To understand basic concept of GANs model, the original paper, written by Ian Goodfellow in 2014 , is referred to explain. Secondly, related works to solve current GANs problems are shown. Then, the triple GANs are explained. It includes examination of a game with 3 players approach from the viewpoint of game theory and theoretical analysis. Lastly, the result of experiments the author reported in this paper is illustrated.
GANs model and its problems
GAN is formulated as a two player game, where the generator G takes a random noise z as input and produces a sample G(z) in the data space, while the discriminator D identifies whether a certain sample comes from the true data distribution p(x) or the generator distribution.
the global equilibrium of this game is achieved if and only if generator distribution is equal to the true data distribution, which is desired in terms of image generation. Cat-GAN generalizes GANs with a categorical discriminative network and an objective function that minimizes the conditional entropy of the predictions given the real data and maximizes one given the generated samples. There are 2 main problems in existing GANs for SSL: (1) the generator and discriminator may not be optimal at the same time; and (2) the generator cannot control the semantics of the generated samples.
(1) There is the difficulty and its contradiction of training of the categorized classification with one more class corresponding to the true/fake data and the generation of indistinguishable samples. The phenomena are considered to arise from the two-player formulation, where a single discriminator has to play 2 roles — identifying fake samples and predicting labels. The optimal discriminator D should identify whether x is true/fake and predict the correct class of x. It conflicts as D has 2 incompatible convergence points.
(2) None of the existing GANs can learn the disentangled representations like the object category in SSL, though some work can learn such representations given full labels. In current two-player model, the discriminator takes only a single data instead of data-label pair for justifying whether real/fake, and the generator will not receive any feedback of label information from the discriminator.
What is the triple GAN?
I feel the introduction of new role “classifier” is natural idea. It’s worth notable how the three players communicate and give the feed back to each other on the same game plate. This paper introduces it as a “triple GANs” below charts.
The roles of two players — the classifier and generator are to generate the pseudo labels given real data and the pseudo data given real labels, separately. The discriminator focus on the single role whether a data-label pair is from the real labeled dataset or not. The key point of “the triple GANs” is 3 joint distributions, i.e. the true distribution and the distributions defined by the classifier and generator networks as data-label pairs.
This new three nets formulation is expected to solve current 2 problems: (1) on the data generation side, the desirable equilibrium is that both the classifier and the conditional generator are optimal, which means the good classifier becomes the good generator and vice a versa. (2) the discriminator can access and integrate the label information from the classifier and data information from the generator. There is pretty interesting questions in the theoretical and practical parts as we need careful design of compatible utilities including adversarial losses and unbiased regularizations to reach this equilibrium, that is, the convergence to GANs optimal solution on three-player game.
The author mentions the related works of multi image classification in SSL by GANs such as Cat-GANs, LAP-GAN, DC-GAN, Info-GAN and ALIs (Adversarially learned inference). They also mentions the other neural nets such as conditional VAEs, ADGM, and Ladder Network.
I will be happy to share my note as the each neural nets on the other article.
How does the model build and work?
The goal is to predict the labels y for unlabeled data as well as to generate new samples x conditioned on y. They build their game-theoretic objective based on the insight that the joint distribution can be factorized in two ways.
This factorization is corresponded to (1) a classifier C gives p(y|x), (2) a class-conditional generator G gives p(x|y), and (3) real data-label pair is p(x,y) on three components of Triple-GAN. The pseudo input-label pairs (x, y) generated by both C and G are sent to D with real data-label pair.
parameter alpha is a constant that controls the relative importance of generation and classification. In this paper, they focus on the balance case by fixing 1/2. The equilibrium is achieved if and only if p(x, y) = (1 − α)pg (x, y) + αpc(x, y).
However, unfortunately, it cannot guarantee that p(x,y) = pg(x,y) = pc(x, y) is the unique global optimum. To address the problem, they introduce the standard supervised loss (i.e., cross-entropy loss) to C.
It will be proven that the game reaches the global optimum for C and G.
The algorithm is shown below.
On the classifier training, since label information is extremely insufficient in SSL, they propose to use the pseudo discriminative loss Rp. This is because a good G will provide the meaningful labeled data beyond the training set and it will boost the predictive performance. It is worth notable that the directly minimizing KL divergence between C and G is infeasible.
The theoretical analysis
Firstly, the authors adopts it MNIST, SVEN, and CIFAR10 datasets, averaged by 10 runs. The label data is used n=100, 1000, 4000, respectively.
Secondly, they evaluated their method with 20, 50 and 200 labeled samples on MNIST, compared with Improved-GAN.
Triple-GAN can simultaneously achieve the state-of-the-art classification results among deep generative models and disentangle styles and classes and transfer smoothly on the data level via interpolation in the latent space.
Source: Deep Learning on Medium