Source: Deep Learning on Medium
Distance-IoU Loss: An Improvement of IoU-based Loss for Object Detection Bounding Box Regression
In object detection task, we try to guide the computer to predict the objects and their location in a given image data. To accomplish this, we try to formulate a mechanism to mimic the behavior of “localizing” an object by enclosing them with a “rectangle” notation which usually called as a bounding box.
These bounding boxes, usually annotated as 4 points of value represents either specific corner/center points of the coordinates of the bounding box or its width/height. The usual format of the annotation is either (left, top, right, bottom), (left, top, width, height), or (center_x, center_y, width, height). Thus, the deep learning for object detection task is designed specifically to predict values related to those points by regressing them for the localization part
In the earlier works such as SSD , YOLOv3 , and Faster-RCNN , adopted a conventional Ln-norm (n =1,2) on the coordinates of predicted and ground truth bounding boxes to measure the distance between bounding boxes. However, the performance of the object detection model as measured in COCO object detection challenge is Average Precision (AP) which is tightly related to the Intersection-over-Union (IoU) metric of the prediction and ground truth bounding boxes rather than the points coordinates. Thus, Yu et al. 2016  and Rezatofighi et al. 2019  mentioned in their paper that Ln-norm is not a suitable choice to obtain optimal IoU metric. Thus, Rezatofighi et al. 2019  propose the first approach towards IoU-based bounding box regression loss.
To improve IoU metric, the initial IoU loss is suggested which the formula is shown below :
However, this loss only addressed the overlapped bounding boxes and will not provide any learning for the non-overlapping cases. To address this, they also propose an improvement in the form of additional penalty term to the loss, where C denotes the smallest box that can enclose predicted and ground-truth bounding box and called it Generalized IoU Loss (GIoU)
By using this loss, the predicted bounding boxes will try to expand (increase the size ) its prediction towards the ground truth bounding boxes. Thus the prediction can move closer to the correct ones. However, it will fall to the same case with standard IoU Loss in the case of enclosing bounding box. Additionally, it needs more time to converge due to heavily relying on the IoU term empirically. This phenomenon is found and explained by Zheng et al.  in which they propose a better version of IoU Loss called Distance-IoU Loss (DIoU) and the complete form Complete-IoU Loss (CIoU).
So, what is Distance-IoU Loss? It is still an IoU Loss similar to the previously mentioned IoU-based loss. It also adds a penalty term to handle non-overlapping cases. But, it’s penalty term on IoU Loss is based on the bounding boxes center point coordinates. So we can say that this loss is a combination of IoU-based loss and Ln-norm-based loss with some modification. The formula of Distance-IoU Loss is shown below:
The penalty term directly tries to minimize the distance between b which is the center point of the predicted bounding box and b_gt which is the center point of the ground truth. They also add denominator c which is the diagonal length of the smallest enclosing box covering the two boxes.
This proposed loss inherits some properties from its predecessor :
- DIoU still invariant to the scale (width/height) regression problem
- It also provides moving direction for predicted bounding boxes when non-overlapping ground truth, similar to GIoU
And its merits are:
- Converge faster than GIoU, directly minimize center points distance, rather than minimum enclosing area.
- For the inclusion of two bounding boxes, DIoU still can provide better learning, while GIoU almost degraded to standard IoU Loss, i.e ∣C-A∪ B∣→0
Furthermore, during their analysis of the previous method, they suggest that a good bounding box regressor should consist of three important geometric factors which they have addressed before :
- Overlapping area: solved by IoU Los
- Central point distance: solved by GIoU ( by moving the predicted bounding boxes ), further improved by DIoU ( by directly minimizing it )
- Aspect ratio: ???
Thus, they propose a complete IoU loss that addresses all of them. This loss is based on DIoU loss with an additional penalty based on the consistency of the aspect ratio. The formula is shown below :
Where α is the trade-off parameter and υ measures the consistency of aspect ratio. The υ formula is defined by
and α formula is
by which it will give higher priority for regression to the overlap area factor, especially for non-overlapping cases.
The conclusion is, the IoU-based loss has proven superiority compared to Ln-norm-based loss. Furthermore, this paper has proposed that there are 3 important aspects for good bounding boxes regression loss which are overlap area, center point distance, and aspect ratio. The CIoU loss proposed to address all of these aspects.
 Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; and Berg, A. C. 2016. Ssd: Single shot multibox detector. In The European Conference on Computer Vision (ECCV).
 Redmon, J., and Farhadi, A. 2018. Yolov3: An incremental improvement. arXiv:1804.02767.
 Ren, S.; He, K.; Girshick, R.; and Sun, J. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28.
 Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; and Huang, T. 2016. Unitbox: An advanced object detection network. In Proceedings of the ACM International Conference on Multimedia.
 Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; and Savarese, S. 2019. Generalized intersection over union: A metric and a loss for bounding box regression. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
 Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; and Ren, D. 2020. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. AAAI 2020