Denoising Autoencoders Explained

Original article was published by Sreyan Ghosh on Deep Learning on Medium


Deep Learning Demystified

Denoising Autoencoders Explained

Autoencoders, a term popularized by the numerous online courses on Deep Learning. But what are they? Are they actually of any use? Here, I make an effort to clarify all those doubts.

Photo by Elijah O’Donnell on Unsplash

For anyone who has been acquainted with Deep Learning for a few months, the term autoencoder has surely caught their attention. Most industrial tasks require one to be familiar with the encoder-decoder architecture and here, autoencoders surely do not go amiss. This is an area still under improvement and research for solving the unsupervised learning problem but currently has very particular use cases.

But not all of you will be having the same level of experience as people who have been around DL for some time, so I will try my best to “keep things simple” if there is anything as such.

Imagine this. You are taking a walk in the park when two people on a motorcycle snatch the gold necklace off your neck. A common incident, right? Could happen to anyone. In fact, it is so common that in 2018, in Delhi, 18 cases of chain snatching were reported per day. In most of the cases, the culprit is on a vehicle. Vehicles can be tracked down using the license plate number, right? Not if the CCTV camera meant to be there for our safety produces images in which the numbers are barely legible. This issue of clarity is something that has rendered hundreds of footages inadmissible in court and has resulted in the culprit getting off scot-free.

Enter denoising autoencoders (DAE). After passing the image of the bike’s plates through this model, we can very well expect a legible result. Stacking these DAEs results in something called a super-resolution generator. That in essence is taking a low-res image and making a high-res image out of it. Refer to this link for a demonstration with code: https://keras.io/examples/vision/super_resolution_sub_pixel/

Today I will tell you a bit about how DAEs work and then go through some important things while dealing with the code and leave you with a notebook on how to denoise images.

So, what are Denoising Autoencoders? The DAE is an autoencoder that receives a corrupted data point as input and is trained to predict the original uncorrupted data point as the output. We introduce a corruption process C(x̄ | x) which represents a conditional distribution over corrupted samples x̄, given a data sample x. The autoencoder then learns a reconstruction distribution Pᵣₑ𝒸ₒₙₛₜᵣᵤ𝒸ₜ (x | x̄) estimated from training pairs (x, x̄), as follows:

1. Sample a training example x from the training data.

2. Sample a corrupted version x̄ from C(x̄ | X = x).

3. Use (x, x̄) as a training example for estimating the autoencoder reconstruction distribution Pᵣₑ𝒸ₒₙₛₜᵣᵤ𝒸ₜ(x | x̄) = P𝒹ₑ𝒸ₒ𝒹ₑᵣ (x | h) with h the output of the encoder.

f(x̄) and P𝒹ₑ𝒸ₒ𝒹ₑᵣ typically defined by a decoder g(h).

The last few lines must have been a whirlwind for quite a few of you. If it was not, then voila, you already know how DAEs work. For the people for whom it was, fret not. I was in your position not too long ago. Compare this to YouTube videos! Your training data is the 144p video frame and the target data is the 1080p video frame. So even if your model errs a bit on the way, you will be getting a 540p video. Upgrade!

So how do we create this DAE? I will be leaving the link to the notebook on my Github so check it out. I will go through some of the core aspects of the DAE in the article.

The above snippet is for the creation of the convolutional autoencoder class. The dataset which we are manipulating is the Fashion MNIST dataset. We add a random value to each pixel in the dataset, effectively adding noise to the dataset. Coming back to our autoencoder class, we are able to see 2 convolutional layers in the encoder and 3 in the decoder. Keeping the fancy names aside, effectively the purpose of this “encoder-decoder” structure is to map a general relation between a noisy and a clear image.

Next, we compile the model and train it using the “fit” method. We have defined our class in such a way that it inherits the methods of the Tensorflow Model class.

I would like to encourage all of you to code alongside reading this so that if you face any problems you can rectify them without having to search for where the error is happening and what is the logic for the code you are writing.

Please feel free to refer to my Github repository for any kind of assistance in this matter: https://github.com/sreyan-ghosh/tensorflow_files/blob/master/Others/autoencoders_tf.ipynb

It would also be a pleasure to connect with you all on LinkedIn: https://www.linkedin.com/in/sreyan-ghosh-b0722a18b/