GluonCV 0.4: further and faster

Source: Deep Learning on Medium


Author: Zhi Zhang, Applied Scientist at Amazon

By analyzing user feedback and requests, we are happy to announce the new features in GluonCV 0.4:

  • New application: Human Pose Estimation model
  • Faster Deployment: INT8 deploy model and Pruned ResNet for faster inference
  • Better base classification networks: ResNext, SE_ResNext series(we provide 80%~81+% accuracy models pre-trained on ImageNet)
  • Faster/Mask-RCNN models with Feature Pyramid Networks(FPN).

Meanwhile, the usability and stability of existing modules have been improved dramatically.

Human Pose Estimator

Human pose estimation models are crucial for analyzing human behavior. GluonCV provides a complete set for human pose applications, including network definitions, training scripts, loss function, and metrics. We also provide tutorials for bootstrapping your applications.

Let’s see some awesome real-life examples:

The following table summarizes our pre-trained human pose estimation models on COCO dataset with state-of-the-art performances.

Deploy with INT8

We have been collaborating deeply with Intel, and introducing INT8 model deployment in GluonCV! Powered by Intel Deep Learning Boost(VNNI), INT8 quantized models in GluonCV can achieve significant speedup over 32bit floating point operators. Benchmark on AWS EC2 C5 instances:

Now you can use int8 versions of models out of GluonCV model zoo: resnet50_v1_int8 is the quantized version of resnet50_v1, later we will introduce API to convert all models to INT8. Note that you will need a Skylake or newer Intel CPU in order to achieve reasonable speed up due to hardware instruction limitation.

Pruned ResNet

We know CNNs are redundant in most case, GluonCV 0.4 provides you a bunch of resnet with pruned structures and parameters. You now can achieve up to 9 times faster without losing significant inference accuracy.

For instance, resnet50_v1d_0.37 contains roughly 0.37x parameters of resnet50_v1d, 5.01x indicate that it’s 5 times faster during inference. You can refer to a more intuitive version here to choose the right network for your purposes.

More interesting GANs

Super-resolution GAN(SRGAN)

Style Transfer CycleGAN

Faster/Mask-RCNN with FPN

SoTA performance provided by Faster/Mask-RCNN with FPN

Improvements and Bug fixes

  • All ResNets and variants now support SynchronizedBatchNorm in Model Zoo
  • Pre-trained object detection models are able to reset_class, by defining reuse_weights it can reuse partial knowledge of previous categories, allowing models to detect classes without finetuning. Please refer to this tutorial.
  • Fix DataLoader crashing problems(mxnet>=1.4.0)
  • Now PSP and DeepLabv3 models can hybridize like other models
  • Fix some random NaN problems(requires mxnet nightly)
  • Improve GPU NMS op(requires mxnet nightly)

GluonCV training skills in v0.3 are now public

We have unveiled tricks used in v0.3 with significant impact on pre-trained models

Acknowledgment

We sincerely appreciate contributors: @xinyu-intel @hetong007 @zhreshold @khetan2 @chinakook @Jerryzcn@husonchen @zhanghang1989 @sufeidechabei @brettkoonce @mli @lgov @djl11 @YutingZhang @mzchtx@sharmalakshay93 @astonzhang @LcDog @zx-code123 @adursun @ifeherva @ZhennanQin @islinwh @jianantian@feynmanliang @ivechan @eric-haibin-lin

Links

Please Like/Star/Fork/Comment/Contribute if you like GluonCV!

References

[1] He T, Xie J, Zhang Z, et al. Bag of tricks for image classification with convolutional neural networks[J]. arXiv preprint arXiv:1812.01187, 2018.

[2] Zhang Z, He T, Zhang H, et al. Bag of Freebies for Training Object Detection Neural Networks[J]. arXiv preprint arXiv:1902.04103, 2019.

[3] Intel Deep Learning Boost. https://www.intel.ai/intel-deep-learning-boost

[4] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2223–2232.

[5] Ledig, Christian, et al. “Photo-realistic single image super-resolution using a generative adversarial network.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.