Self-Supervised GANs using auxiliary rotation loss

Source: Deep Learning on Medium

The figure on the left shows that a normal 1-vs-all classifier on Cifar10 dataset tends to show substantial forgetting despite the tasks being similar. Each time the task changes, the accuracy drops substantially. However, this is not the case when the loss function is aided with self-supervision. This demonstrates that the model does not retain generalizable representations in such a changing environment.

The figure on the right shows a similar effect during GAN training. Every 100k iterations, the discriminator is used for IMAGENET classification and this shows the same pattern of forgetting which is not the case with self-supervision.

Self-Supervised GAN

Before going on to details of SS-GAN, first, we will have a quick look at what self-supervised learning. The idea is to train a model on a pretext task that can be defined and the labels for every sample can be decided according to the activity. That activity can be any change in the input, for example, predicting the rotation of the input or predicting the relative location of an image patch. Now talking about its use here, the authors have added this task of prediction of rotation angle into the discriminator. Thus, along with the adversarial prediction of fake vs real, it also tries to predict the tilt of the image among a set of {0, 90, 180 and 270} angles. This has been borrowed from the state of the art self-supervision methods as proposed in [1]. This makes the discriminator have two heads and the overall functioning of the model looks as in the figure below:

Collaborative Adversarial Training

The generator and the discriminator in this model are still adversarially playing the minimax game using the standard adversarial loss aided with spectral norm and gradient penalty. However, we are trying to mimic the benefits(information in other words) that a conditional GAN gets from the labels. The labels help the generator in deciding what kind of an image to generate instead of random pixel generation. Similar is the effort in the SS-GAN. The generator is not exactly conditional as it always generates “upright” images that are further rotated for the discriminator to predict. On the other hand, as the authors say I quote

“the discriminator is trained to detect rotation angles based only on the true data.”

This prevents the generator to generate images that are easy to predict the rotation of.

To sum it up, the discriminator has two heads. The goal of the discriminator on non-rotated images is to predict real vs fake. On rotated real images, it is to predict one among the 4 rotation angles.

Experiments

They use standard resnet based architectures for discriminator and generator taken from unconditional GANs that they have compared SS-GAN with. Weight of the rotation loss is controlled using two hyperparameters, one for real images and one for fake images.

TO compare sample quality, the authors use FID.

Further, the results can be described using the figure below:

The important thing to note is the performance improvement that self-supervision provides over unconditional-GANs.

Conclusion

In my opinion, this work opens a new line of GANs where we can get a stable image generation of conditional GANs without using labelled data. Replacing the discriminator with state-of-the-art models can help further improvement. The authors also propose the idea of using it in a semi-supervised setting using a small number of labels for further improvement.