Source: Deep Learning on Medium
1st Place for Single Model Results in ILSVRC 2016 Object Detection Challenge
This time, the approach by Hikvision (海康威视), in ILSVRC 2016 object detection challenge, is briefly reviewed. Hikvision was launched in 2001 based at Hangzhou in China. Hikvision advances the core technologies of audio and video encoding, video image processing, and related data storage, as well as forward-looking technologies such as cloud computing, big data, and deep learning.
Hikvision has won several competitions in ILSVRC 2016:
- Object Detection: 2nd place, 65.27% mAP
- Object Localization: 2nd place, 8.74% error
- Scene Classification: 1st place, 9.01% error
- Scene Parsing: 7th place, 53.5% Average of IoU & pixel accuracy
In this story, I only focus on detection challenge. Though Hikvision has the state-of-the-art results on detection task, there is not much innovative technology or novelty. Maybe due to this reason, they haven’t published any papers or technical reports about it.
Instead, they only shared their approaches and results in the ImageNet and COCO joint workshop in 2016 ECCV. (Sik-Ho Tsang @ Medium)
- Cascaded RPN
- Global Context
- Other Techniques
- Summary of Object Detection Elements
1. Cascaded RPN
- Cascaded Region Proposal Network (RPN) is used to generate proposals.
- Naïve RPN: Batch size of 256 with Negative/Positive Samples ratio usually > 10
- Cascaded RPN: Batch size of 32 with Max N/P ratio of only 1.5.
- With Cascaded RPN and better N/P ratio, recall is improved.
- 9.5% Recall@0.7 gain.
2. Global Context
- With Global Context, global features are extracted and concatenated with ROI features to have better classification accuracy.
- 3.8% mAP gain is obtained.
3. Other Techniques
- Pre-training on ImageNet LOC: 0.5% mAP gain.
- Balanced Sampling: 0.7% mAP on VOC 2007.
4. Summary of Object Detection Elements
- Pre-Activation ResNet is used.
5.1. ILSVRC 2016 Detection Challenge
- Using single model, actually, Hikvision obtains Rank 1, better than CUImage team, using GBD-Net.
- However, using ensemble models, GBD-Net obtains better results.
5.2. ILSVRC 2016 Localization Challenge
- Hikvision obtains Rank 2 with classification error of 3.7% and localization error of 8.7%.
5.3. PASCAL VOC 2012
- Hikvision outperforms ResNet.
With different techniques combined together, Hikvision is able to obtain 2nd place in ILSVRC 2016 detection challenge.
[2016 ECCV] [Hikvision] (Slides Only)
Towards Good Practices for Recognition & Detection
My Previous Reviews
[LeNet] [AlexNet] [ZFNet] [VGGNet] [Highway] [SPPNet] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [Inception-v3] [Inception-v4] [Xception] [MobileNetV1] [ResNet] [Pre-Activation ResNet] [RiR] [RoR] [Stochastic Depth] [WRN] [FractalNet] [Trimps-Soushen] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [MSDNet]
[OverFeat] [R-CNN] [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] [CRAFT] [R-FCN] [ION] [MultiPathNet] [NoC] [G-RMI] [GBD-Net / GBD-v1 & GBD-v2] [TDM] [SSD] [DSSD] [YOLOv1] [YOLOv2 / YOLO9000] [YOLOv3] [FPN] [RetinaNet] [DCN]