Original article was published by Bharat Ahuja on Deep Learning on Medium
We were lucky enough to be given an opportunity to be a part of the program and a special thanks to Dr. Deepak Garg, Dr. Madhushi Verma and our mentor Dr. Suneet Gupta for the internship project without whom our experience and project would not have been a success.
Our proposed solution to the problem above is Image Inpainting using Auto Encoder/Decoder approach.
What is Image Inpainting?
Image inpainting is the task of filling “patches” in an image, this can be used in various aspects of image processing. One can remove unwanted parts of the image while keeping the integrity of the image intact. Image inpainting presents a lot of implications in real life, such as recovery of a lossy image after transmission, different varieties of image/painting restorations.
This can be done in a number of ways, the traditional one being image processing wherein the general/most basic idea would be to:
- Detect a patch/region to be filled in the image
- Go pixel by pixel from outside the region to inside/center of the region by filling with the respective colors from the adjacent pixel
But a more robust way would be to use deep learning wherein we use some form of neural networks to automatically do the thing for us which in turn would provide us with a better result than the traditional way as these “patches” will be patched via predictions based on dataset which our deep learning model will be trained on.
There are generally 2 ways to do this in deep learning: GAN (Generative Adversarial Networks) and Autoencoder/Decoder and our team was allocated the Autoencoder/Decoder method.
So what is Autoencoder?
Autoencoder is basically a neural network which has 3 layers namely: input, middle/hidden/encoding and output/decoding layers. What we basically do is compress our input and then reconstruct/extract the output by decompressing it.
It consists of three layers, each explained below:
- Encoder- In the above architecture, the encoder compresses the input image into a ‘latent’ space representation. Briefly explained, The encoder layer compresses the input image and then encodes it in a reduced dimension.
- Code: This part of the above architecture is synonymous to the compressed/encoded input which is passed onto the decoder.
- Decoder: This layer of the above architecture decodes the image, that is it upscales the encoded image into its original dimension. During this stage, loss is encountered during the reconstruction of the encoded image and it is reconstructed from the latent space representation as mentioned in the above diagram.
The task at hand is to remove the need of expensive post-processing and blending operations, while dealing with irregular holes/masks robustly and produce a meaningful and semantically correct image. For this purpose, we thought of removing the convoluted layers by replacing them with partially convoluted layers and mask-updates.
Our approach has a series of Conv2D layers along with Transpose layers to achieve the desired result, but we introduced partially convoluted layers instead of the traditional Conv2D layer along with updating of the mask in each layer, thus creating a PConv2D layer. A partial convolution layer comprises of a masked convolution operation which has been re-normalized and then is followed by a mask-update setup. Doing so makes the model not consider all the pixels that are part of the patch/mask. The U-Net structure was used to design the architecture of the model. The activation function used for triggering the layers is ‘relu’ and for the last layer we have used the ‘sigmoid’ function.
The prediction model gave better results than traditional Autoencoder-decoder models. The use of partially convoluted layers improved the colour-correction, kept the edges intact, and saved us the expensive blending methods to create a semantically correct image composition. The result so obtained is shown below.
Experience at LeadingIndia AI