Review: RoR — ResNet of ResNet / Multilevel ResNet (Image Classification)

Source: Deep Learning on Medium


Go to the profile of SH Tsang

In this story, RoR (Residual Networks of Residual Networks) is shortly reviewed. After the success of ResNet which become an state-of-the-art deep learning approaches and won numerous recognition competitions, there were numerous researches working on how to generalize or improve the ResNet, such as Pre-Activation ResNet, ResNet in ResNet (RiR), ResNet with Stochastic Depth (SD), Wide Residual Network (WRN). RoR is another paper to improve the ResNet, they introduce a concept that a group of ResNet blocks can also have one shortcut connection. This makes a network become multi-level hierarchical ResNet. This is a paper firstly appeared in 2016 after ResNet, accepted in 2017 and finally published in 2018 TCSVT recently, which have got tens of citations already. (SH Tsang @ medium)


What Are Covered

  1. Concept of RoR (Residual Networks of Residual Networks)
  2. RoR-m: Level Numbers m
  3. Different Versions of RoR
  4. Results

1. Concept of RoR (Residual Networks of Residual Networks)

Original ResNet (Left), RoR (Right)

Original ResNet is shown at the left above, numerous Residual Blocks are cascaded together and form a very deep network.

Within a Residual Block, there are two paths:

  • Convolution path which perform convolution to extract the features
  • Shortcut Connection path to directly transmit the input signal to the next layer.

With Shortcut Connection paths, gradient vanishing problem can be reduced because error signal can be propagated to early layers easier during back propagation.

RoR shown at the right above, proposes that we can also have shortcut connection across a group of Residual Blocks. On top of this, we can also have another level of shortcut connection across a group of “groups of Residual Blocks”.

Authors argue that:

  • RoR transfers the learning problem to learning the residual mapping s of residual mapping, which is simpler and easier to learn than the original ResNet.
  • And layers in upper blocks can also propagate information to layers in lower blocks.

2. RoR-m: Level Number m

Level number m is introduced:

  • When m = 1, RoR has only the final-level shortcuts, i.e. the original residual network.
  • When m = 2, RoR has only the root -level (the most outer level), and final-level shortcuts.
  • When m = 3, RoR has the root -level, middle-level and final-level shortcut.

For the middle-level shortcuts, each shortcut will be across the Residual Blocks which have the same size of feature maps.

And m = 4 and 5 are also tested, but without any details about it in the paper. Nevertheless, the results are not good enough compared with m = 3.


3. Different Versions of RoR

RoR-3 Using Original ResNet (Left), RoR-3 Using Pre-Activation ResNet or WRN (Right)

RoR is applied to different versions of ResNet as shown above.

To be brief, RoR-3 uses Conv-BN-ReLU. Pre-RoR-3 uses BN-ReLU-Conv while WRN is the wider and shallower version of Pre-RoR-3. (If interested, please read my reviews of Original ResNet, Pre-Activation ResNet, and WRN.)


4. Results

4.1. CIFAR-10, CIFAR-100, SVHN

  • CIFAR-10: 10-class dataset
  • CIFAR-100: 100-class dataset
  • SVHN: Street View House Number dataset
Test Error (%) on CIFAR-10, CIFAR-100, SVHN Dataset
  • RoR-3-164: By applying RoR to 164-layer original ResNet, 4.86% and 22.47% (+SD means the use of Stochastic Depth, to reduce overfitting) test errors for CIFAR-10 and CIFAR–100 datasets are obtained respectively. (164 is the model depth.)
  • Pre-RoR-3-164+SD: By replacing RoR with Pre-Activation ResNet, 4.51% and 21.94% test errors for CIFAR-10 and CIFAR–100 datasets are obtained respectively.
  • RoR-3–WRN40–4+SD: By replacing Pre-Activation ResNet with a wider 40-layer WRN40–4, 4.09% and 20.11% test errors for CIFAR-10 and CIFAR–100 datasets are obtained respectively.
  • RoR-3–WRN58–4+SD: With a deeper 58-layer WRN-58–4, 3.77% and 19.73% test errors for CIFAR-10 and CIFAR–100 datasets are obtained respectively.

4.2. ImageNet

ImageNet: 1000-class large scale dataset in ILSVRC.

10-Crop Testing of Validation Error (%) on ImageNet Dataset

Different layer versions of RoR-3 consistently outperforms different layer versions of ResNet.

There are detailed ablation experiments within the paper. If interested, please visit the paper.


Similar approach, using long and short skip connections, has also been applied to biomedical image segmentation. Hopefully I can cover it as well.