Improving PewDiePie’s camera quality with Autoencoders

Original article was published on Artificial Intelligence on Medium

Improving PewDiePie’s camera quality with Autoencoders

Let’s take a look at how we can use Deep Learning for Image Super-Resolution with Autoencoders.

Comparison of the 480p input (left) to an Autoencoder trained for the task of image super-resolution, with it’s higher quality output at the same resolution (right).

Recently, I have been reading about various image super resolution techniques that utilize Deep Learning for improving image clarity. Some very impressive results have been achieved using techniques like GANs and Autoencoders for this task. It is safe to presume that most smartphone cameras and other image processing software these days make use of such AI to “enhance” images.

In this article, I would like to explore and detail how effective Autoencoders are for this task and demonstrate some results on a recent video of PewDiePie.

Why PewDiePie?

If you have been following the most-subscribed YouTuber recently, you would know he faces a lot of criticism and ridicule for having low visual quality of his videos in spite of using expensive camera gear for recording videos.

This made me think it would be the prefect use-case to play with the super-resolution AI algorithms and see how much better quality of videos we can achieve with them.

What is Image Super-Resolution?

The technique by which a low resolution (LR) blurry image can be up-scaled to output a sharper and more-detailed higher resolution (SR) image is termed as single image super resolution. The aim is to recover information of the objects in the image that has been lost due to poor camera quality or poor lighting conditions.

An example of image super resolution using Neural Networks (ESRGAN). [Source]

Convolutional Neural Networks (CNNs) have proven to be rather good at such tasks, especially compared to the more traditional techniques of interpolation. With their ability to learn about shapes and textures of common objects, CNNs are very effective in recovering information that may otherwise not even be present in the LR images. So, let’s take a look at how we can train a CNN based autoencoder for this task.

Gathering Training Data

First, let’s take a look at our training data. We will use a pair of Low Resolution — High Resolution (LR-HR) images of Pewds for training our Autoencoder network.

Generating training data

It is fairly easy to gather this type of data even though it is Supervised Learning. We only need high resolution images and it is very easy to generate their low resolution counterparts using a simple downscaling followed by upscaling operation on the image, as described in the figure above. This gives us our input-output pairs for network training.

Autoencoder Architecture

The Autoencoder network contains two main blocks — the Encoder and the Decoder. The entire Encoder-Decoder network setup is shown in the following figure.