Prakash Kumar Pandey, Electrical Engineering to Deep Learning

Prakash implemented NIPS 2017 paper “Toward Multimodal Image-to-image Translation”, and is a winner of the Global NIPS Paper Implementation Challenge. See his code implementation here.

Tell us a little about yourself?

I am a 3rd year undergraduate student in Electrical Engineering from the Indian Institute of Technology, Roorkee (IITR). Despite coming from an electrical engineering background, I have developed a strong interest in Deep Learning for Computer Vision and Natural Language Processing and I wish to pursue an exciting research career in this field.

How did you get started in AI?

I started Machine Learning in my sophomore year with Andrew Ng’s course on Coursera, which had stimulated a passion of learning AI in me. After working with traditional machine learning algorithms for a few months, I started Deep Learning which I found even more interesting. I am a member of the Paper Discussion Group, which is an open group at IITR started with an aim to develop a research culture in Deep Learning in the campus. Here we discuss research papers of various renowned conferences such as ICLR, NIPS, CVPR, etc twice in a week.

What are you most passionate about in the AI industry?

The amazing experimental results featured in the world’s most reputed conference papers in Deep Learning and the applications of AI in our day-to-day life motivates me in doing what I do. I am most passionate about Deep Generative Models.

Can you give us an overview of your implementation in the Challenge?

I implemented the paper “Toward Multimodal Image-to-Image Translation”. In this paper, the aim is to produce a distribution of output images given an input image. For this, a cVAE-GAN aims to learn a low dimensional representation of the target images using an encoder net. Using a generator network, we try to reconstruct the target image using the input image conditioned over the latent vector and at the same time we try to minimise the KL-divergence between Q(z|B) and N(z). Now with a cLR-GAN, we generate a target image using the input image and z randomly sampled from N(z). The output image is then fed to the encoder net which gives us a point estimate of the latent vector as z’ which we want to be close to the sampled z from N(z). The model was trained using Adam optimiser and batch normalisation with batch size of 1. Leaky ReLU was used for all the networks. The model was trained over 3 different datasets namely edges2shoes, cityscapes and facades.

I trained the model on Nvidia 1080 Ti GPU. The results provided in my GitHub repository were obtained only with 2 epochs on edges2shoes dataset due to time constraint. But I will train the network again with more epochs and also to produce multiple output images for a single input image from the edge2shoes dataset.

Were there any challenges while implementing your selected paper?

Yes, there were many challenges while implementing this paper. There were a lot of bugs in my code and it took 3–4 days to fix all those bugs. Even then, the quality of the generated images were very poor. Then I tried different architectures and different settings of hyperparameters and finally after a few attempts full of trial and error, I managed to get good results. While training the model, I realised that Generative Adversarial Networks are really unstable. Also, I had a few doubts in the theory of the paper which haven’t been cleared yet (if anyone of you would be kind to help with that, please reach out!).

What’s next for you in your work?

I will train the model on all the datasets as given in the original paper. After that, I will try to implement another NIPS paper titled “Unsupervised Image-to-Image Translation Networks” paper. Apart from this, I am working on two other projects — one in NLP and another in Reinforcement Learning.

Prakash is a 3rd year undergraduate student in Electrical Engineering at Indian Institute of Technology, Roorkee (IITR). To keep up to date with Prakash, check out his Github.

This is a feature of the winner of the Global NIPS Paper Implementation Challenge. You can read other winners’ feature here. Let us know if you enjoyed this series and would like to see more of content like this, drop us a comment or an email at

Source: Deep Learning on Medium