Source: Deep Learning on Medium
Consider one incidence of crime and there is a victim who have seen the criminal and remember his/her face, then how cops will catch the criminal? One situation is that victim will describe the criminal in front of sketch artist and artist will generate the picture of criminal. This situation is normal, but what if computer can generate that face of criminal in few seconds i.e. simply give the description of criminal and machine will generate the image immediately, seems fascinating, right?
This is possible in today’s era because of the advancement in Deep Learning technologies. There is one technology named as Generative Adversarial Networks (GANs), which is an evolutionary technology in field of Image generation. Now talking about GAN, it takes d-dimensional noise as an input and generate the random image with respect to the data set given. As you can understand that GAN is an unsupervised algorithm, it will generate random image. But what if you want to generate a specific image, for example considering the above incident of crime we want a specific image of criminal.
GAN has some disadvantages like generate random image. So in this article, I will talk about how we can generate specific image using GAN, for this we have to train GAN by supervised approach. I am not saying that i will built supervised GAN, it will logically supervised.
The Main Idea
Main idea behind this approach is that in training data of GAN every text description has its corresponding image as label and therefore GAN will able to generate specific image.
step 1: Image captioning part
This is used for generation of training data set for Image generation by Generative Adversarial Network. It takes image as an input, give it to Convolution Neural network(CNN) which comprise of series of convolution and pooling layers and finally generate one feature vector. This vector will then be fed into Recurrent Neural Network(RNN) which is able to generate corresponding captions. For detailed discussion and implementation of Image captioning, you can visit and follow my another post Image Captioning.
Step 2: Image generation through StackGAN
For this image generation task, input images of image captioning task will be act as labels and output of image captioning task i.e. text descriptions will be act as inputs. StackGAN comprise of series of generator and discriminator network as you can see in the above image where initial generator-discriminator pair generate low resolution images and as you go deeper resolution will increase.
For more detail about StackGAN, you can visit and follow my post Image generation.
Scope of this project
Firstly, the main advantage of this approach is that we can generate more specific images. Apart from that GANs are not that much efficient and require lots of training time, this approach will help to reduce training time and also will generate more efficient and high resolution images.
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
- A Comprehensive Survey of Deep Learning for Image Captioning