Source: Deep Learning on Medium
Deep learning requires a large amount of data. This phrase has become popular among people who consider applying deep learning methods to their data. Concerns often made when not having “big” enough data mainly derive from the common belief that deep learning only works using massive amount of data. Well, This is not true.
Although for some cases you actually do need massive amount of data, there are some networks that could be trained on a single image. On top of that, In practice, even without large datasets, the structure of the network itself could be preventing deep networks from over-fitting.
This post is about “Deep Image Prior”, a fascinating paper by Dmitry Ulyanov that was published at CVPR 2018. This paper shows that the structure of a CNN is sufficient so solve image restoration problems. In simple words, The paper presents the claim that CNN contains “knowledge” of the natural image. In addition, the authors utilize this claim for image restoration tasks like image denoising, super-resolution, in-painting and more.
In this post I’ll cover three things: First, an overview of image restoration tasks and some use cases. Second, an overview for “Deep Image Prior” and how it can be used utilized for image restoration tasks. And Finally, we are going to perform denoising task using Deep Image Prior — Image restoration with neural networks but without learning GitHub repository implemented in PyTorch.
When we are referring to image restoration problems we basically mean that we have a degraded image and we want to recover the clean non-degraded image. There could be many reasons for an image to get degraded, Mainly, degradation of images may occur during image transmission, formation, and storage.
There are a lot of tasks in image restoration, Let’s talk about three main tasks:
Denoising and general reconstruction
Image denoising refers to an attempt to restore images contaminated by additive noise or sources such as compression.
Super — resolution
The goal of super-resolution is to take a low-resolution image and up-sample it to create a high-resolution version.
Image in-painting is the process of reconstructing lost or deteriorated parts of images and videos. This technique is often used to remove unwanted objects from an image or to restore damaged portions of old photos. The figures below show example image-in-painting results.
Of course, there a whole lot more use cases but for now let us try and understand the paper’s novel technique.
Deep Image Prior
1. What is the “Prior”?
Consider, that you need to perform super-resolution task by yourself. For example, you would be given a low-resolution image (the left image in Figure 4 below), a pen and a paper and basically ask to resolve it. Hopefully this is what you will draw (the right image in below Figure 4 below).
So, how do you do that? You would probably use your knowledge of the world; What is a face, The structure of the face, i.e. The location of the eyes, nose, mouth etc. You would also use specific information from the low-resolution image. So we can define prior more intuitively as our basic beliefs in the absence of information. e.g. In the case of images, a prior over images basically represents what we think natural images should look like.
2. Learned and explicit priors
If you want a computer to do image restoration e.g. image denoising, you will probably collect a large data set of clean and noisy images and train a deep neural network to take the noisy image as an input and just get a clean image as output. So, it can be said that the network learn the prior through the data set. This approach is called Learned Prior.
The problem with this is that this approach requires massive amounts of noisy and clean image pairs.
Another way to resolve this task it to perform explicit prior or handcrafted prior, where we do not need to use any additional data other than our image.
We can think of this problem as an optimization problem to yield the desired clean image x, where we aim to create an image x* that is both close to the noisy image x^ but is also “natural” or “clear” like the clean image x.
For example, we can measure the “closeness” noted as the data term E(x,x^), using l2 distance between pixel values for denoising tasks or other task-dependent data-term.
In addition for the data-term, let’s assume there a function R(x) that can measure the “unnaturalness” or “unclearness” of an image. In this case, the formulation to our optimization objective would be the maximum posteriori distribution to estimate the unobserved value from the empirical data:
The data-term pulls the term towards the original image, making sure that the image does not deviate too far. In addition The right term, that is R(x), pulls x in the direction of natural images, (hopefully) reducing the noise. So we can think of R(x) as a regularization term. Without it, the optimizer will “overfit” on the noisy image. Hence, the way we define our prior/regularization term is crucial in obtaining good results.
Unfortunately, we do not have an exact prior over natural images. Traditionally, we have used hand-crafted features to represent the prior, but these always involve some level of arbitrariness. The essence of this paper is that CNNs can be used as priors over images; in other words, CNNs in some way “know” what natural images should and should not look like.
2. Networks structure to define a prior
So, presenting the task of minimizing the function over the images x
The conventional approach would be minimizing this function at image space with an initial estimate in that space, basically initial the image with noise and then compute the gradient of this function with respect to x, update our weights and reiterate until convergence.
But can we do it differently? We can say that every image x is an output of a function that maps a value from a different space to the image space.
Here, we a have parameter space θ and we have a mapping from parameter space θ to images x and instead optimizing over image the optimization is done over θs.
In Fig. 8 we can see that we start with an initial value in the parameter space and we immediately map it to image space, compute the gradient with respect to g(.), following with θ update using gradient decent and reiterate until convergence.
So, why do we want to do that? what is the difference between optimizing on image space or parameter space? The function g(.) could be treated as a hyper-parameter the can be tuned to highlight the images that we want to get. i.e. the “natural” image. If we think about it, the function g(θ) actually defines a prior. And thus instead of optimizing the sum of two components. We will now optimize only the data term.
We can define a network structure, for example UNet or Resnet and define θ as the network parameters. As such, we express our minimization function as follows:
where, z is the random fixed input image and θ is randomly initialized weights which will be updated using gradient descent to get the desired output image.
Get the idea? Here the variable is θ! Unlike other types of networks where you fix the weights and varies the inputs to get different outputs, here they fixed the output and varies the weights to get different output. This is how they get the map function g(.) to the image space.
3. Why use this parameterization?
But still it is not obvious why should we consider this parameterization method. Theoretically at first glance, it would seem like it would generate original noisy image. In paper, the authors conducted an experiment that showed that when gradient descent is used to optimize the network, the convolution neural network are reluctant to noisy images and descends much more quickly and easily towards naturally-looking images.
Each curve represents the change in loss as we optimize images and noise, as well as images with noise added. This figure shows that the loss converges much faster for natural images compared to noise. This means that if we cut off the training at an appropriate timing, we can obtain a “natural” image. This is why this paper regards CNNs as a prior: it (somehow) has a bias towards producing natural images. This allows us to use a CNN decoder as a method for generating natural images under some restrictions.
Let’s look at some results for common tasks.
Blind restoration of a JPEG-compressed image
Deep image prior can restore an image with a complex degradation (JPEG compression in this case). As the optimization process progresses, the deep image prior allows to recover most of the signal while getting rid of halos and blockiness (after 2400 iterations) before eventually overfitting to the input (at 50K iterations).
In the below image, in-painting is used to remove text overlaid on an image. Deep image prior results leads to an almost perfect results with virtually no artifacts.
The deep image prior is successful at recovering both man-made and natural patterns.
Implementation of Deep Image Prior in PyTorch
Now that we have seen the concept and math behind Deep Image Prior. Let’s implement it and perform a denoising task in PyTorch. The entire project is available in the Deep Image Prior — Image restoration with neural networks but without learning GitHub repository.
The notebook structure is as follows:
Pre — Processing
The few first cells regards to Importing libraries so make sure you got all dependencies installed correctly. The list of libraries you need to install to execute the code if listed in the GitHub repository. In addition this is where you choose the image to be denoised.
In this example I’ve chosen an image applied with shot noise using Shot-Noise-Generator GitHub repository on one half shown below.
The code below is where the magic happens, The randomly initialized image z is updated repeatedly inside closure() function. The data term (MSE in our case) is computed and the parameter space θ get’s updated.
The block would generate an image per iteration so you could track the progress of the optimization process. When you are satisfied with the results stop the process and run the following code below to create an awesome gif visualizing the entire process.
See below the results from our denoising task we implemented. On the right, the noisy image and on the left, the entire restoration progress step by step.
if you’re interested in the source code it can be found in my Deep Image Prior — Image restoration with neural networks but without learning GitHub repository.
As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn.
Till then, see you in the next post! 😄