Review: DRRN — Deep Recursive Residual Network (Super Resolution)

Source: Deep Learning on Medium

Go to the profile of SH Tsang
Digital Image Enlargement, The Need of Super Resolution

In this story, DRRN (Deep Recursive Residual Network) is reviewed. With Global Residual Learning (GRL) and Multi-path mode Local Residual Learning (LRL), plus the recursive learning to control the model parameters while increasing the depth, up to 52 layers can be achieved. And DRRN significantly outperforms state-of-the-art approaches such as SRCNN, FSRCNN, ESPCN, VDSR, DRCN, and RED-Net. And it is published in 2017 CVPR with more than 100 citations. (SH Tsang @ Medium)

Outline

  1. DRRN Architecture
  2. Number of Residual Units (U) & Number of Recursive Blocks (B)
  3. Ablation Study
  4. Results

1. DRRN Architecture

Comparison of Different Architectures
  • ResNet: Residual units are used for prediction. With each residual unit, there is identity mapping (i.e. skip connection) and residual mapping using two convolutions.
  • VDSR: Residual learning is used between the input low-resolution (LR) image and output high-resolution (HR) image. This is a global residual learning (GRL). 20 layers of 3×3 convolution are stacked in the residual branch. Large receptive field of 41×41 can be achieved. And VDSR actually is a special case of DRRN. (To be mentioned later)
  • DRCN: There is also a skip connection which can be treated as a GRL. At the residual branch, recursive convolution is used, i.e. parameters are shared among all convolutions (green). The output of each recursive convolution is responsible for generating a HR image. And all HR images are weightly ensembled to get the final output.
  • DRRN: GRL is also used due to the effectiveness in VDSR and DRCN. At the residual branch, instead of using recursive convolutions in DRCN, a stack of residual blocks are used with the use of recursive convolutions inside the residual blocks, as shown above. Therefore, multi-path local residual learning (LRL) is introduced here.

2. Number of Residual Units (U) & Number of Recursive Blocks (B)

Number of Residual Units (U)

2.1. Number of Residual Units (U)

  • U: Number of Residual Units in a recursive block. Examples of U={1,2,3} are shown as above.
  • It is noted that there is no increase in parameters when U is increasing. Because the parameters are shared.
  • And pre-activation residual unit is used, i.e. BN-ReLU-Conv, instead of Conv-BN-ReLU. This is suggested in Pre-Activation ResNet to improve the original ResNet.

2.2. Number of Recursive Blocks (B)

B=6, 6 Recursive Blocks (Left), U=3, 3 Residual Units in a Recursive Block (right)
  • The above shows the meaning of B, i.e. how many recursive blocks are used within the network.
  • Thus, the depth of DRRN d (number of convolutional layers) can be calculated based on B and U:
d=(1+2×UB+1
  • If U=0, DRRN becomes VDSR.
  • The loss function is the standard MSE:

3. Ablation Study

3.1. Some Details

  • Training Set: 91 images from Yang, and 200 images from Berkeley Segmentation Dataset, 291 images in total.
  • Test Set: Set5, Set14, BSD100, and Urban100.
  • Data Augmentation: Flipped and rotated versions are used, i.e. 7 additional augmented versions. Scale augmentation is also used with different scales (×2, ×3, and ×4).
  • 31×31 patches with stride of 21 are used. Mini-batch size is 128.
  • Every convolutional layer has 128 filters and size of 3×3.
  • With d=20, training takes 4 days with 2 Titan X GPUs.
  • Bicubic interpolation is first apply before going into the network.
  • Luminance component only.
  • Pixels near image boundary is cropped before evaluation.

3.2. Study of B and U

Various combinations of B and U (scaling factor ×3 on Set5)
  • By Fixing one parameter to 3 and change the other one from 1 to 4, the above figure shows that increasing B or U results in deeper models and achieves better performance, which indicates deeper is still better.
  • The models are comparable as long as their depths are similar, e.g., B2U3 (d = 15, k = 784K) and B3U2 (d = 16, k = 1,182K) achieve 33.76 and 33.77 dB, respectively. (k = #parameters)
  • By fixing one parameter to 1 and change the other to construct networks with d = 52, we can have B1U25 (k = 297K) and B17U1 (k = 7,375K). For B1U25, only one recursive blocks with 25 residual units are recursively learned. For B17U1, 17 recursive blocks are stacked without any recursive learning.
  • B3U8 (d = 52, k = 1,182K) is also constructed.
  • All B17U1, B3U8, B1U25, having d=52, are having similar performance.
  • B1U25 uses far fewer parameters. Thus, it is treated as the best model.

3.3. DRRN Variants

DRRN Variants (NS: No Sharing of Weights, C: Chained, Not Multi-path)
  • DRRN_NS_C: With LRL but without multi-path recursive learning, 33.92dB.
  • DRRN_NS: DRRN but without weight sharing, 33.97dB.
  • DRRN_C: DRRN but without using multi-path, just a chained convolution at the residual branch, 33.95dB.
  • DRRN: 33.99dB. This shows that all components are important to improve the results.

4. Results

4.1. Comparison with State-of-the-Art Models

Benchmark Results Using PSNR and SSIM
  • DRRN_B1U9 (d = 20, k = 297K): It has the same depth as VDSR and DRCN but fewer parameters.
  • Both DRRN_B1U9 and DRRN_B1U25 outperforms all approaches including SRCNN, VDSR and DRCN, in all datasets and scaling factors.
  • Especially in Urban100 dataset, DRRN significantly advances all approaches with large margin.
Benchmark Results Using Information Fidelity Criterion (IFC) metric
  • A metric Information Fidelity Criterion (IFC), which is correlated with perceptual scores, is also evaluated.
  • Here, VDSR is re-implemented by authors, using BN. (Original VDSR does not use BN.)
  • DRRN still outperforms all approaches.
  • 20-layer B1U9 DRRN takes 0.25 second to process 288×288 image on a Titan X GPU.
PSNR for scale factor ×3 on Set5 and Set14
  • It is shown that deeper model is crucial, and DRRN is the best among all recent three-years model, including SRCNN, FSRCNN, ESPCN, VDSR, DRCN and RED-Net.
  • In terms of number of parameters, DRRN has the highest PSNR while having relative fewer parameters due to the sharing of weights.

4.2. Qualitative Results

  • DRRN can obtain sharper edges while others are blurry.

The GRL and LRL make me remind the RoR and U-Net+ResNet where long and short skip connections are utilized to improve the accuracy. But of course, there are differences, for example, for LRL, the skip branches always take the same input here, and weights are shared in the convolutions at residual branches.

Reference

[2017 CVPR] [DRRN]
Image Super-Resolution via Deep Recursive Residual Network

My Previous Reviews

Image Classification
[LeNet] [AlexNet] [ZFNet] [VGGNet] [SPPNet] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet]

Object Detection
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [DeepID-Net] [R-FCN] [ION] [MultiPathNet] [NoC] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [FPN] [RetinaNet] [DCN]

Semantic Segmentation
[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [ParseNet] [DilatedNet] [PSPNet] [DeepLabv3]

Biomedical Image Segmentation
[CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet]

Instance Segmentation
[DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

Super Resolution
[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN]