Source: Deep Learning on Medium
Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
This is for keypoint detections → heatmap regression → the loss function is not studied hence → they have proposed a novel loss function. (The good thing about this → foreground pixels have higher credit)
There are a lot thoughts on this stuffs → a lot of thought has came in.
There are a lot of applications when it comes to keypoint detection → and deep learning is used as a solution. (but we can be more precise) → and what they do here is to focus on the foreground objects. (with their loss → the heatmap becomes sharper)
The usual way to train these models is to use MSE → but this is not a good choice since it is not that robust to outliers. (They are using → foreground information for better estimation) → and the author’s loss function is much better.
Something like those above
So this is much like adaptive cross-entropy from Facebook → they are playing around with the adaptive loss for the better gradient. (and of course, they are using CNN → with stacked AE architecture).
Some of the background information → is usually on the application part rather than the loss function itself → there is not that much research done here. (there are some loss functions → but it has limited use cases → author’s loss function is much better)
As seen in the output → it is a 3D tensor with a channel for each keypoints → this is important and we can even create a boundary.
And we can compare the difference of the loss gradients. (the model was inspired from other paper → there is a coordinate encoding method) There is some theory behind the loss function → the main take away is to stabilize the gradient. (basically covering what the other loss functions cannot provide → a gradient of some loss function has a linear relationship).
The above was the original ‘wing loss’ → this error function → very hard to get a zero value → never converge.
So the main ideas are to focus on harder pixels → while already the easy parts are trained well → this is to stabilize the training process.
There is not that much difference between the wing loss → there is a variable introduced → there is some exponential term as well. (when errors are close to zero → the gradient seems to be getting smaller) → but there is some nonlinear relationship.
The adaptive wing loss → mimics a lot of different losses → such as MSE and more → this is because the loss function is more sophisticated. (the data itself is hard to process → since we need to know which points are in the foreground or background) → if there need some pre-processing for this loss function → it might not be a good idea.
Additionally, they have incorporated boundary information.
When we are calculating the NMSE → and the author’s results were the lowest → this was somewhat obvious. (there were some data augmentation done → and this was a reasonable amount)
For the different datasets, → the author’s approach was able to outperform compared to other methods → and their approach was better than human beings! Wow.
300w → was tried as well.
Their method only fails in 2 percent of the images.
Ablation study → what would happen?
The search space was too large → so they had to make some assumptions.
The bad thing about this loss function → there is some hand engineering work that has to be done. (depending on the dataset)
Quite a lot of ablations studies were done → the weighted loss really helps. (they also did → human pose estimation → that as cool → and they were able to show that this method is good for human pose estimation as well).
The loss function was studied → and a proposal was made!