Source: Deep Learning on Medium

## Outperforms MobileNetV1, NASNet, and ShuffleNet V1

In this story, **MobileNetV2**, by **Google**, is briefly reviewed. In the previous version MobileNetV1, **Depthwise Separable Convolution** is introduced which dramatically reduce the complexity cost and model size of the network, which is suitable to Mobile devices, or any devices with low computational power. In MobileNetV2, a better module is introduced with** inverted resiaul structure**. **Non-linearities in narrow layers are removed** this time. With MobileNetV2 as backbone for feature extraction, state-of-the-art performances are also achieved for object detection and semantic segmentation. This is a paper in **2018 CVPR **with more than **200 citations**. (Sik-Ho Tsang @ Medium)

### Outline

**MobileNetV2 Convolutional Blocks****Overall Architecture****Ablation Study****Experimental Results**

**1. MobileNetV2 Convolutional Blocks**

#### 1.1. MobileNetV1

- In MobileNetV1, there are 2 layers.
- The
**first layer**is called a**depthwise convolution**, it performs lightweight filtering by applying a single convolutional filter per input channel. - The
**second layer**is a**1×1 convolution**, called a**pointwise convolution**, which is responsible for building new features through computing linear combinations of the input channels. **ReLU6**is used here for comparison. (In original MobileNetV1, the normal ReLU is used.), i.e. min(max(*x*, 0), 6) as follows:

- ReLU6 is used due to its robustness when used with low-precision computation, based on [27].

#### 1.2. MobileNetV2

- In MobileNetV2, there are two types of blocks. One is residual block with stride of 1. Another one is block with stride of 2 for downsizing.
- There are 3 layers for both types of blocks.
- This time, the
**first layer**is**1×1 convolution with ReLU6.** - The
**second layer**is the**depthwise convolution**. - The
**third layer**is another**1×1 convolution but without any non-linearity.**It is claimed that if ReLU is used again, the deep networks only have the power of a linear classifier on the non-zero volume part of the output domain.

- And there is an expansion factor
*t*. And*t*=6 for all main experiments. - If the input got 64 channels, the internal output would get 64×
*t*=64×6=384 channels.

### 2. Overall Architecture

- where
*t*: expansion factor,*c*: number of output channels,*n*: repeating number, s: stride. 3×3 kernels are used for spatial convolution. - In typical, the
**primary network**(width multiplier 1,**224×224**), has a computational cost of**300 million multiply-adds**and uses**3.4 million parameters**. (Width multiplier is introduced in MobileNetV1.) - The performance trade offs are further explored, for
**input resolutions from 96 to 224**, and**width multipliers of 0.35 to 1.4**. - The network computational cost up to 585M MAdds, while the model size vary between 1.7M and 6.9M parameters.
- To train the network, 16 GPU is used with batch size of 96.

### 3. **Ablation Study**

#### 3.1. Impact of Linear Bottleneck

- With the removal of ReLU6 at the output of each bottleneck module, accuracy is improved.

#### 3.2. Impact of Shortcut

- With shortcut between bottlenecks, it outperforms shortcut between expansions and the one without any residual connections.

### 4. Experimental Results

#### 4.1. ImageNet Classification

- MobileNetV2 outperforms MobileNetV1 and ShuffleNet (1.5) with comparable model size and computational cost.
- With width multiplier of 1.4, MobileNetV2 (1.4) outperforms ShuffleNet (×2), and NASNet with faster inference time.

- As shown above, different input resolutions and width multipliers are used. It consistently outperforms MobileNetV1.

#### 4.2. MS COCO Object Detection

- First, SSDLite is introduced by modifying the regular convolutions in SSD with depthwise separable convolutions (MobileNetV1 one).
- SSDLite dramatically reduces both parameter count and computational cost.

- MobileNetV2 + SSDLite achieves competitive accuracy with significantly fewer parameters and smaller computational complexity.
- And the inference time is faster than MobileNetV1 one.
- Notably, MobileNetV2 + SSDLite is 20× more efficient and 10× smaller while still outperforms YOLOv2 on COCO dataset.

#### 4.3. **PASCAL VOC 2012 **Semantic Segmentation

- Here, MobileNetV2 is used as feature extractor for DeepLabv3.
- With the disabling of Atrous Spatial Pyramid Pooling (ASPP) as well as Multi-Scale and Flipping (MP), also changing the output stride from 8 to 16, mIOU of 75.32% is obtained, with far low of model size and computational cost.

#### Reference

[2018 CVPR] [MobileNetV2]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

#### My Previous Reviews

**Image Classification **[LeNet] [AlexNet] [Maxout] [NIN] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [SqueezeNet] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [Shake-Shake] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [DMRNet / DFN-MR] [IGCNet / IGCV1] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2]

**Object Detection **[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [G-RMI] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]

**Semantic Segmentation **[FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [ParseNet] [DilatedNet] [DRN] [RefineNet] [GCN] [PSPNet] [DeepLabv3]

**Biomedical Image Segmentation **[CUMedVision1] [CUMedVision2 / DCAN] [U-Net] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [M²FCN] [SA] [3D U-Net+ResNet]

**Instance Segmentation **[SDS] [Hypercolumn] [DeepMask] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [FCIS]

**Super Resolution **[SRCNN] [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [DRRN] [LapSRN & MS-LapSRN] [SRDenseNet]

**Human Pose Estimation **[DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] [CPM]