A Deep Journey into Super-resolution

Original article was published on Deep Learning on Medium

SRCNN/IRCNN/DnCNN Architecture
ESPCN Architecture

Linear Networks:

The simplest and earliest network designs are Linear Networks. Their network architecture is linear without any skip connections or multiple branches. The up-sampling operation performed in linear networks are either Early up-sampling or Late up-sampling.

EDSR/MDSR/SR-ResNet Architecture
CARN Architecture

Residual Networks:

To avoid vanishing gradients and to design very deep networks, Residual Networks use skip connections in their network design. The network learns residue, i.e. the high frequencies between the input and ground-truth. Such networks are categorized into Single-stage networks and Multi-stage networks based on the number of stages.

BTSRN Architecture
RED-Net Architecture
DRCN Architecture
DRRN Architecture

Recursive Networks:

The main motivation behind Recursive Networks is to break down the harder SR problem into a set of simpler ones. Recursive networks either employ recursively connected convolutional layers or recursively linked units.

MemNet Architecture
LapSRN Architecture

Progressive Reconstruction Designs:

Sometimes using CNN algorithms to predict the outcome in one step is not feasible for large scaling factors. For such large factors, the algorithms predict in multiple steps, i.e. 2× followed by 4× and so on.

RDN Architecture
SRDenseNet Architecture

Densely Connected Networks:

Densely Connected Networks are inspired by DenseNet architecture for image classification. They combine hierarchical cues available along the network depth to achieve high flexibility and richer feature representations.

DBPN Architecture
CNF Architecture

Multi-branch Designs:

Multi-branch networks aim to obtain a diverse set of features at multiple context scales. Such complementary information is then fused to obtain better HR reconstructions. This design also enables a multi-path signal flow, leading to better information exchange in forward-backward steps during training. Multi-branch designs are becoming common in several other computer vision tasks as well.

IDN Architecture
SelNet Architecture
RCAN Architecture
DRLN Block

Attention-based Networks:

The previously network designs consider all spatial locations and channels to have uniform importance for SR. In several cases, it helps to selectively attend to only a few features at a given layer. Attention-based models allow this flexibility and consider that not all the features are essential for super-resolution but have varying importance. Coupled with deep networks, recent attention-based models have shown significant improvements for SR.

SRMD

Multiple-degradation Handling Networks:

All the designs so far consider bicubic degradations. However, in reality, this may not be a feasible assumption as multiple degradations can simultaneously occur.

SRGAN Architecture
SRFeat Architecture

Generative Adversarial Networks:

Generative Adversarial Networks have two components, namely a generator and discriminator. The generator creates SR images that a discriminator cannot distinguish as a real HR image or an artificially super-resolved output. In this manner, HR images with better perceptual quality are generated. The corresponding PSNR values are generally degraded, which highlights the problem that prevalent quantitative measures in SR literature do not encapsulate the perceptual soundness of generated HR outputs.

Experimental Evaluations:

This section describes the comparison of more than 30 state-of-the-art algorithms over six challenging datasets — Set5, Set14, BSD100, Urban100, DIV2K, and Manga109.

Number of Parameters:

The algorithms were evaluated on the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) measures. Methods with direct reconstruction perform one-step upsampling from the LR to HR space, while progressive reconstruction predicts HR images in multiple upsampling steps. Depth represents the number of convolutional and transposed convolutional layers in the longest path from input to output for 4× SR. Global residual learning (GRL) indicates that the network learns the difference between the ground truth HR image and the upsampled (i.e. using bicubic interpolation or learned filters) LR images. Local residual learning (LRL) stands for the local skip connections between intermediate convolutional layers.

Comparison of CNN-based SR algorithms based on various parameters.

As one can notice, methods that perform late upsampling have considerably lower computational cost compared to methods that perform upsampling earlier in the network pipeline.

Comparison of Multiplication-Addition operations in various SR networks. Note that FLOPs are roughly double the number of multadds. Algorithmic runtime (during inference) is proportional to the multiadd operations.
Comparison of several parameters in various SR architectures. The memory footprint and training time of the model are directly related to the number of tunable parameters.

Quantitative Measures:

Mean PSNR and SSIM for the SR methods evaluated on the benchmark datasets. The ’-’ indicates that the method is not suitable to handle the images of the corresponding dataset or used the dataset during training or the source code is not available publicly.
Mean PSNR and SSIM for the SR methods evaluated on the benchmark datasets for a higher super-resolution factor 4×.

The PSNR and SSIM performance of DRLN are better for 2× and 3× and ESRGAN for 4×. However, it is difficult to declare one algorithm to be a clear winner compared to the rest as there are many factors involved such as network complexity, depth of the network, training data, patch size for training, number of features maps, etc. A fair comparison is only possible by keeping all the parameters consistent.

A visual comparison between a few of the state-of-the-art algorithms which aim to improve the PSNR of the images.

Super-resolution comparison on 8× and 4× sample images with sharp edges and texture, taken from URBAN100.

The output of the GAN-based algorithms which are perceptually-driven and aims to enhance the visual quality of the generated outputs is shown below.

Qualitative comparison for generative adversarial network algorithms for 4× super-resolution.

As one can notice, outputs are generally crisp, but the corresponding PSNR values are relatively lower compared to methods that optimize pixel-level loss measures.

8× Super-resolution:

The algorithms for higher magnification levels, the artifacts in the images became more visible.

The performance of state-of-the-art algorithms on widely used publicly available datasets, in terms of PSNR (in dB) and SSIM for 8×.

It is clear from the images that most of the state-of-the-art algorithms struggle to reproduce the textures in high magnified versions of the images.

Choice of Network Loss:

The most popular choices for network loss is either mean square error or mean absolute error in the convolutional neural network for the image super-resolution. Similarly, Generative adversarial networks (GANs) also employ perceptual loss (adversarial loss) in addition to the pixel level losses such as the MSE. It is evident that the initial CNN methods were trained using l2 loss; however, there is a shift in the trend towards l1more recently, and absolute mean difference measure (l1) has shown to be more robust compared to l2. The reason is that l2 puts more emphasis on more erroneous predictions while l1 considers a more balanced error distribution.

Network Depth:

Contrary to the claim made in SRCNN that network depth does not contribute to the better numbers rather it sometimes degrades the quality, VDSR initially proved that using deeper networks helps in better PSNR and image quality. EDSR further established this claim, where the number of convolutional layers were increased by nearly four times that of VDSR. Recently, RCAN employed more than four hundred convolutional layers to enhance image quality. The current batch of CNNs, are incorporating more convolutional layers to construct deeper networks to improve the image quality and numbers, and this trend has continuously remained a dominant one in deep SR since the inception of SRCNN.

Skip Connections:

Overall, skip connections have played a vital role in the improvement of SR results. These connections can be broadly categorized into four main types: global connections, local connections, recursive connections, and dense connections. Initially, VDSR utilized global residual learning (GRL) and has shown enormous performance improvement over SRCNN. Further, DRRN and DRCN have demonstrated the effectiveness of recursive connections. Recently, EDSR and RCAN employed local residual learning (LRL) i.e. local connections while keeping the global residual learning (GRL) as well. Similarly, RDN and ESRGAN engaged dense connections and global ones. Modern CNNs are innovating ways to improve and introduce other types of connections between different layers or modules.

Conclusion:

Single-image super-resolution is a challenging research problem with important real-life applications. The phenomenal success of deep learning approaches has resulted in rapid growth in deep convolutional network-based techniques for image super-resolution. A diverse set of approaches have been proposed with exciting innovations in network architectures and learning methodologies. Through extensive quantitative and qualitative comparisons, we note the following trends in the existing art: (a) GAN-based approaches generally deliver visually pleasing outputs while the reconstruction error based methods more accurately preserve spatial details in an image, (b) for the case of high magnification rates (8× or above), the existing models generally deliver sub-optimal results, (c) the top-performing methods generally have higher computational complexity and are deeper than their counterparts, (d) residual learning has been a major contributing factor for performance improvement due to its signal decomposition that makes the learning task easier. Overall, we note that the SR performance has been greatly enhanced in recent years with a corresponding increase in the network complexity. Remarkably, the state-of-the-art approaches still suffer from limitations that restrict their application to key real-world scenarios (e.g., inadequate metrics, high model complexity, inability to handle real-life degradations). We hope this article will attract new efforts towards the solution of these crucial problems.

References: