SpineNet: An Unconventional Backbone Architecture from Google Brain

Original article was published on Artificial Intelligence on Medium

Comparative Results

The experiments are conducted for object detection as well as for the task of image classification to demonstrate the versatility of the proposed architecture.

Object Detection

The ResNet-FPN backbone model is replaced with the RetinaNet detector for the task of object detection. The model is evaluated on the COCO test-dev dataset and is trained on the train2017 split.

  • The following results (Figure 6) demonstrate that SpineNet models outperform other popular detectors by large margins. The largest SpineNet-190 achieves the highest 52.1% AP. Generally, SpineNet architectures require a fewer number of FLOPs and a lesser number of parameters making the models computationally less expensive.
Figure 6: One-stage object detection results on COCO test-dev. Different backbones with RetinaNet are employed on single model. By default, training is done using multi-scale training and ReLU activation for all models in this table. Models marked by dagger (†) are trained by applying stochastic depth and swish activation for a longer training schedule. [Source: [1]]
  • The following results (figure 7) on COCO val2017 demonstrate that SpineNet-49 requires ~10% lesser FLOPs and AP has improved to 40.8 as opposed to 37.8 in R50-FPN.
Figure 7: Results comparisons between R50-FPN and scale-permuted models on COCO val2017. [Source: [1]]
  • RetinaNet model adopting SpineNet backbones achieves a higher AP score with considerably less number of FLOPs as compared to ResNet-FPN and NAS-FPN backbones (figure 8).
Figure 8: The comparison of RetinaNet models adopting SpineNet, ResNet-FPN, and NAS-FPN backbones. [Source: [1]]

Image Classification

SpineNet is trained on two datasets- ImageNet ILSVRC-2012 and iNaturalist-2017 for the purpose of image classification.

  • On ImageNet, the Top-1% and Top-5% accuracy are at par with ResNet and in addition to that, the number of FLOPs is considerably reduced.
  • On iNaturalist, ResNet is outperformed by SpineNet with a large margin of 5% along with a reduction in FLOPs.
Figure 9: Image classification results on ImageNet and iNaturalist. [Source: [1]]

The above results demonstrate that SpineNet not only works better for object detection but also proves to be versatile enough for other visual learning tasks like image classification.