Source: Deep Learning on Medium
Outperforms or Comparable With ResNet, Pre-Activation ResNet, Stochastic Depth, WRN, FractalNet, DenseNet, ResNeXt, PyramidNet, IGCNet / IGCV1
In this story, DMRNet / DFN-MR, by Zhejiang University, University of Science and Technology of China, Chinese Academy of Sciences, UC San Diego, Microsoft Research, is briefly reviewed. By using Merge-and-Run Mappings, Merge-and-Run Block is introduced which shares an average path for two group convolutions. It is firstly published in 2016 arXiv called “On the Connection of Deep Fusion to Ensembling” for DFN-MR. Then it is published in 2018 IJCAI as “Deep Convolutional Neural Networks with Merge-and-Run Mappings” for DMRNet with 10 citations. It can be seen in the link by Microsoft that the website is mentioning DFN-MR but linked to DMRNet. They both introduce Merge-and-Run idea. Thus, in this story, I would like to review mainly the DMRNet one. (Sik-Ho Tsang @ Medium)
- From Residual Block to Merge-and-Run Block
- Analyses of Merge-and-Run Block
- Experimental Results
1. From Residual Block to Merge-and-Run Block
1.1. Residual Block (ResNet)
- The above equation is the very familiar equation of a Residual Block from ResNet. With xt as input from t-th residual block, H(t) is the output of convolutional path. By adding both, we got xt+1.
1.2. Vanilla-Assembly Block (DVANet)
- Before talking about Merge-and-Run block, there is also the Vanilla-Assembly Block.
- It is a ResNeXt-like block but with only 2 convolutional paths. (If interest, please read my review about ResNeXt.)
1.3. Merge-and-Run Block (DMRNet)
- A Merge-and-Run Block is formed by assembling two residual branches in parallel with a merge-and-run mapping:
- Merge: Average the inputs of two residual branches.
- Run: And add the average to the output of each residual branch as the input of the subsequent residual branch.
2. Analyses of Merge-and-Run Block
2.1. Information Flow Improvement
- The above equation can be written in matrix form:
- It can be:
- This shows that during the forward flow there are quick paths directly sending the input and the intermediate outputs to the later block.
- A similar conclusion can be drawn for gradient back-propagation.
- Thus, merge-and-run mappings can improve both forward and backward information flow.
2.2. Shorter Paths
- All the three networks are mixtures of paths, where a path is defined as a sequence of connected residual branches, identity mappings, and possibly other layers (e.g., the first convolution layer, the FC layer) from the input to the output.
- The proposed network are distributed in the range of lower lengths, potentially performs better.
2.3. DVANet and DMRNet are Wider
- For Vanilla-Assembly Block (DVANet) in matrix form:
- There are two parallel residual branches.
- Hence, Merge-and-Run Block (DMRNet) is also wider.
- But in DMRNet, two residual branches are not independent as there is a merge-and-run mapping.
3. Experimental Results
3.1. Merge-and-Run Mapping
- With Merge-and-Run mapping, it consistently performs better than the networks without Merge-and-Run mapping.
3.2. Comparison with Wide ResNet
- DMRNet performs the best on CIFAR-10.
- And the superiority of DVANets over ResNets stems from the less long paths and greater width.
- On CIFAR-100 and SVHN, when the network is deep enough, DMRNet performs the best.
- But when the network is not deep enough, ResNet and Wide-ResNet are better. Authors believe the paths in the DVANet and DMRNet are not very long and too many short paths lower down the performance for networks in such a shallow network.
3.3. Combination with ResNeXt
- In ResNeXt, it can support K>2 convolutional paths.
- By using Merge-and-Run Mapping on ResNeXt, it becomes DMRNeXt and outperforms ResNeXt which shows the efficiency of Merge-and-Run Mapping.
3.4. Combination with Xception
- Here, DMRNets contain two Xception blocks.
- Again, it outperforms Xception which shows the efficiency of Merge-and-Run Mapping.
3.5. Comparison with State-of-the-art Approaches
There are more experiments and analyses in the paper. If interested, please feel free to read the paper.
[2016 arXiv] [DFN-MR]
On the Connection of Deep Fusion to Ensembling
[2018 IJCAI] [DMRNet]
Deep Convolutional Neural Networks with Merge-and-Run Mappings
My Previous Reviews
[LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet]
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]