Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

Source: Deep Learning on Medium

Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

So now we are going to use coordinates → but we are going to have different loss functions.

This loss function is a piecewise loss function → and focuses on certain regions → this is similar to what the facebook research team did. (coordinates are hard to predict).

How the loss function looks → when compared to other loss functions such as L1 or L2. (when there is a different parameter setting → at the end of the day we are focusing on mid and small error regions).

Since facial localization is very important good idea to research these areas. (multiple network architectures have been used → such as RNN or CNN and more).

L2 loss → sensitive to outlier → and this might affect how the model performs on facial landmark detection. (this is not a good thing). (they do not use heatmap → rather they use direct coordinate information). (most of the method is done on regression → for direct coordinates).

And some network architects are hourglass → stacked autoencoders.

One of the reasons why this problem is so hard is posed by VARIATION. (there are different poses → and not all of them can be predicted correctly). (other problems such as occlusions are also the reason why this problem is hard).

But in general, → they use stacked networks.

The shape vector → can be a heatmap format → but this paper does not use heatmap.

The above is the network architecture → speed is also important → the input is face image → so it seems like there is another network that crop faces.

Multiple papers used L2 loss → this was not a good idea → wing loss → is a better loss.

The loss is really smaller when compared to other loss functions → smoothed L1 loss is another type of loss function.

The above is the wing loss → and there is one hyperparameter that we need to set before. (would be better if there was none like focal loss).

The general idea behind this paper → is to make sure the gradient is not dominated by the larger errors → make sure we are making precise decisions.

And depending on the set parameter → the results differ.

L1 performs well as well → but not as good as wing loss → the translation really makes things hard.

State of the art is 3 something → but this is impressive since we are directly using coordinates and not heatmap. (the authors used data augmentation → but not stylized GAN).

This approach is good → since it is simple to change and fast.