Look at the above eight images. Can you identify which ones are real images of celebrities while which ones are generated by a generative model? (Scroll to the end of this post for the answer. Read through the post for more information!)
Automated generative models have progressed a lot over the last decade , . A latest and very successful variant of generative models in the market today is called as Generative Adversarial Networks aka GANs. Ever since its inception in 2014 , there has been a lot of discussion and research about GANs (link). To put things in perspective, in the last 4.5 years since GAN was proposed, this paper has received about ~4800 citations. This means there are more than 1000 scientific papers and documents per year that cite this work. In other words, there are ~2.9 documents per day that gets published that cites this work! Today, GANs can generate high resolution almost-real looking images from nothing but just random numbers as input.
Now, you all readers will be like “Wow! This is really amazing. Can I start using GANs now?” Err… that’s exactly where the challenge kicks in. To an extent, yes, you can start using GANs now from one of these places: PyTorch-GAN, Keras-GAN, tf-GAN. However, these libraries requires that you are an expert in GAN and well versed in Python and fluent in the corresponding’s library is usage. While knowing all these are not impossible, it takes a paramount of time to pass through the initial learning curve and start playing around with GAN models.
What is missing in the literature is a ready-to-use toolkit that is agnostic of any library or language underneath and gives a very intuitive interface for users to play around with GAN models. The challenge that lies in true democratization is to enable any software engineer and developer to consume GAN models without the need for expert level knowledge.
To solve these set of challenges in consuming GAN models, at IBM Research lab in India, we developed an open source
Gan-Toolkit (found here: https://github.com/IBM/gan-toolkit). The features of the toolkit are as follows:
- It has a highly modularized version and a language agnostic intuitive representation of GAN based on these modules,
- It provide a highly flexible, no-code way of implementing GAN models. The details of a GAN model can be provided as a config file or as command line arguments, and there is no requirement for writing any code!
The following figure shows the highly modularized GAN formulation and the no-code approach of designing GAN architectures.
According to our modular representation, a GAN can be representation by defining the following 6 components:
- Generator function
- Discriminator function
- Loss function
- Optimizer function
- Training process
- Real training data
How to Implement using our Toolkit
Let’s take the example of a popularly used GAN model for images: Deep Convolutional GAN aka DCGAN (link). The DCGAN model’s architecture is visually shown as follows:
As it can be seen, the DCGAN’s generator component is a deconvolution kind of architecture and the discriminator is a CNN, with a binary cross-entropy loss function and an Adam optimizer. The implementation of DCGAN in tensorflow can be found here. It takes roughly about ~500 lines of code in Python written in tensorflow to implement DCGAN. This is not truly possible for all the developers and software engineers.
Using our gan-toolkit (clone it here: https://github.com/IBM/gan-toolkit) one can simple design a GAN model by defining a config file as follows:
Just providing this to the config file to our toolkit is enough to train a GAN model and obtain the performance results. There is no need to write even a single line of code!
The intermediate images generated after every epoch is also stored and provided to the user and can be visualized as follows:
The config file of the gan-toolkit not only provides abstractions and easy-to-use capabilities. It also provides enough flexibility and customization capabilities. For example, the same DCGAN could be defined as follows:
For more information on the customization capabilities, please refer to our documentation here.
Implemented GAN Models
- Vanilla GAN: Generative Adversarial Learning (Goodfellow et al., 2014)
- C-GAN: Conditional Generative Adversarial Networks (Mirza et al., 2014)
- DC-GAN: Deep Convolutional Generative Adversarial Network (Radford et al., 2016)
- W-GAN: Wasserstein GAN (Arjovsky et al., 2017)
- W-GAN-GP: Improved Training of Wasserstein GANs (Goodfellow et al., 2017)
Comparison with Other Toolkits
Realizing the importance of easiness in training GAN models, there are a few other toolkits available in open source domain such as Keras-GAN, TF-GAN, PyTorch-GAN. However, our
gan-toolkit has the following advantages:
- Highly modularized representation of GAN model for easy mix-and-match of components across architectures. For instance, one can use the
generatorcomponent from DCGAN and the
discriminatorcomponent from CGAN, with the training process of WGAN.
- An abstract representation of GAN architecture to provide multi-library support. Currently, we are providing a PyTorch support for the provided
configfile, while in future, we plan to support Keras and Tensorflow as well. Thus, the abstract representation is library agnostic.
- Coding free way of designing GAN models. A simple JSON file is required to define a GAN architecture and there is no need for writing any training code to train the GAN model.
The code and documentation of our GAN Toolkit can be found here: https://github.com/IBM/gan-toolkit. For any queries or issues, please raise an issue in the github or reach out to Anush Sankaran (firstname.lastname@example.org)
Answer to Figure 1: The images in the bottom row are real celebrity face images from public CelebA face dataset (LINK). The images in the top row are generated using a GAN model (described here) trained using a CelebA dataset. These images are generated from nothing but random numbers!
- Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “Stochastic backpropagation and approximate inference in deep generative models.” arXiv preprint arXiv:1401.4082 (2014).
- Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems (pp. 3581–3589).
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).
- Radford, Alec, Luke Metz, and Soumith Chintala. “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434 (2015).
Source: Deep Learning on Medium