Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

Source: Deep Learning on Medium

Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

Another research about loss function → for keypoint detection → for a facial image → can be extended to human pose as well → this wing loss is not the final version → and the adaptive version is made.

There is also data augmentation done → such as to extend images where there is not a lot of data → this is pretty interesting.

Difference between wing loss and other loss functions → the curvature is controlled → so for difference loss value → the error is more focused on harder images → since the gradients are much larger → this is such an interesting idea.

Other networks have been looked into → such as RNN or AE → these are all background information. (but residual connections are much better representations) → but during this time, → most facial landmarks were done on L2 loss which might have some bad properties.

Look into how different loss function → affects the landmark detection task → as well as a new loss function for better prediction. (the loss function’s property is very important when it comes to good results)

FCN → is a fully convolutional network → and we are able to use this by using heatmap as ground truth labels. (dealing with pose variation → which is very hard → there are two solutions → a manual one that can outperform deep learning or the other approach 3D facial model → this is also very powerful).

They do not use heatmap regression → rather → they are predicting the coordinates themselves → is that a good idea? The input is 64 * 64 image → that is pretty a small image → two-stage hourglasses can be used for better results.

They analyzed different loss functions.

The graphs not rendered properly → but what they are showing is the difference of error values for different loss functions → some are more robust to outliers while others are not. (those kinds of characteristics are described).

Normalized ME → wing loss gets the smallest value. (training must focus on small and medium errors not the larger ones → this is the theory behind wing loss).

To get the property of different loss functions → only get the good property → they have created a piecewise loss function → that gives a superior performance. (this is gradient control).

So this is the data augmentation method → where faces are aligned → this is very important and critical for better performance. (and after this augmentation → the model was able to give much better results)

There is another approach → two-stage localization → which is another game itself. (they used MatLab → interesting → must be in the old days). (learning rates were reduced during training → this is more or less standard).

The wing loss is good → but it loses on Challenge subset → interesting why would this happen? (additionally, we can see that special data augmentation is definitely needed → increases the performance a lot).

300w dataset → wing loss ave the best results. (deeper Resnet → gave better results). Fast as well → during evaluation time.

A lot of loss functions were analyzed → and they proposed data augmentation as well as the new loss function.