GSoC Phase 1

Source: Deep Learning on Medium


The Phase 1 of Google Summer of Code 2019 has almost come to an end. During this phase, I worked on two of my five proposed models : Spatial Transformer Networks and VAE-GAN. You can read about Spatial Transformer Networks in my previous blog post : https://medium.com/@manjunathbhat9920/spatial-transformer-network-82666f184299. The code can be found here : https://github.com/thebhatman/Spatial-Transformer-Network.

In this post, I will talk about VAE-GANs and my contributions to Flux and NNlib during Phase 1.

VAE-GAN

A Variational autoencoder consists of two networks : One that encodes a data sample to a latent representation and another that decodes the latent representation back to the data space. Generative Adversarial Networks (GANs) consist of two networks : one network called the generator network that maps a latent representation to the data space and the other called the discriminator network which assigns a probability that a point x in the data space is from the training sample(REAL) or is generated by the discriminator(FAKE). The GAN objective is to find the binary classifier that gives the best possible discrimination between true data and generated data and simultaneously encouraging the generator to fit the true data distribution.

In VAE-GANs, as the name suggests, we combine the idea behind Variational Autoencoders and GANs as shown in the image above. The encoder takes a training sample x as input, and produces a latent representation z. The decoder of the VAE, is also simultaneously used as the generator of the GAN. It takes a latent representation z as input and produces which is a sample in the data space that is generated by the generator. Then we have the discriminator network, which takes a sample from the data space as input. The input is either a sample from the training data or a sample that is generated by decoder/generator. The discriminator is trained in such a way that it assigns a probability of 1 to REAL images (which are from training sample) and assigns a probability of 0 to FAKE images (which are generated by the decoder/generator).

Here’s my code for VAE-GAN : https://github.com/thebhatman/vae-gan.jl.

I am currently training the model on the CelebA dataset, which consists of celebrity faces.

Contributions to NNlib.jl

Recently there was a major overhaul in NNlib, due to which the API for using basic layers such as conv, depthwiseconv, maxpool and meanpool among others, underwent a change. The convolution layers had to now be be called using the DenseConvdims API and the pooling layers had to be called using the PoolDims API. So I worked on adding wrappers for conv, depthwiseconv, maxpool and meanpool in these two PRs : https://github.com/FluxML/NNlib.jl/pull/127 and https://github.com/FluxML/NNlib.jl/pull/121. Both these PRs have been merged and the layers can now be called easily and more intuitively, without the explicit use of the DenseConvDims and PoolDims API.

Contributions to Flux

I worked on integrating Flux with Zygote, in order to make Zygote the default package for Automatic Differentiation in Flux, hence totally moving away from Tracker (which is currently the package for AD in Flux). A good amount of work had already been done in integrating Flux with Zygote. I carried on and continued from there in this PR : https://github.com/FluxML/Flux.jl/pull/791. This PR is almost complete and the tests are passing. But there is currently one test which fails. The pullback object returned by Zygote.forward() which is used to compute gradients, throws an error when it is called on Float16 type data, but works perfectly when called on Float32 and Float64. I need to spend more time debugging why exactly this error is occurring.

So these are my contributions during Phase 1 so far. I will now start implementing my next proposed model : the STARGAN.