Understanding Focal Loss — A Quick Read

Original article can be found here (source): Artificial Intelligence on Medium

Yes, you might have got an idea of what I will be discussing in this blog ;p. But before going to the main topic, I want to implant some of the prerequisite points.

  1. In the case of object detection
  • Positive examples: Target Class or foreground information such as ground-truths.
  • Negative examples: Non-Target Class or background information such as anchors whose IoU with ground-truths is less than a given threshold.
  • Easy positives/negatives: Samples classified as positive/negative examples.
  • Hard positives/negatives: Samples misclassified as negative/positive examples.

2. Class Imbalance Problem

  • This is observed when information related to one class in a dataset or mini-batches used in training is over-represented than the other classes.
  • Training a network on an imbalanced dataset will make the network biased towards learning more representations of the data-dominated class and other classes will be underlooked.
  • In the case of Object Detection, two confidence values are predicted. One is Objectness score(Whether an object is present in a box or not) and the other is Class score(To which class detected object belong to).
  • So it becomes important to maintain a balanced state between foreground-background and foreground-foreground classes while training. If not handled, former creates a box confidence errors(Whether or not an object is present)and the latter creates class confidence errors(If an object is present in the box, then which class does it represent) during training.
Fig 1: (Left) An Imbalance between Background/Foreground. (Right) An imbalance between Foreground/Foreground. The numbers are of RetinaNet[5] on COCO Dataset(80 Classes)[5].
  • Two-Stage Detectors have region proposal extractors, that give out many proposals(both. positive and negative) which are further mitigated by some of the sampling techniques mentioned in [1] such as Online Hard Example Mining and IoU/Objectness Thresholding.
  • On the contrary, One-Shot detectors do not contain region proposal architecture. They directly undergo global regression/classification from feature maps and generate a fixed amount of anchors per location. It becomes somewhat difficult to apply these sampling heuristics to get rid of unwanted negative samples.
  • Even, if applied, the network then also becomes biased towards learning background information which is of no use. This is illustrated below in a given snippet.
Fig 1: (Left) Grid Locations. (Right) Proposals. Blue: Ground Truth Boxes, Red: Negative Proposals, Green: Positive Proposals.
#Configuration set for YOLO Network:::IoU Thresholding Applied.Image Size = (400, 400) #Image input dimensions.
Output Feature Map Size = (50, 50) #Stride of 8.
Anchor Boxes = [(45, 90), (90, 90), (90, 45)]
Ground Truth = [[50, 100, 150, 150], [250, 210,300, 310]]
1. Total Anchors present at each location: 7500(50X50X3)
2. After removing invalid anchors, total remaining anchors: 5301.
3. Total number of Positive RoIs(IoU > 0.5): 5259
4. Total number of Negative RoIs(IoU <= 0.5): 42
  • As seen from the above Fig. 1 and snippet, there is a vast difference between positive and negative samples, stating a proper imbalance between Foreground and Background in these kinds of detectors(YOLOv3[2] have 3 anchors per location on a feature map at a particular scale).
  • Some of the challenges which are been faced in training due to imbalance in a dataset are(stated in [5]):-
  1. Training becomes inefficient as most of the samples are easy negatives which contribute no useful signal. This kind of bias makes difficult for a network to learn rich semantic relationships from the images.
  2. Cumulative easy negatives loss overwhelms the total loss, which degenerates the model.

To solve the above problems Facebook A.I came up with a new modified approach by adding a weighting to cross-entropy loss. Let’s now get straight into the topic for which this article is meant for. Brush Up ;).