Loss functions in machine learning and deep learning.

Source: Deep Learning on Medium


Go to the profile of Nabil Affo

What is a loss function ?

When we build a machine learning or deep learning system, we aim to resolve a problem by learning from our data which is labeled. We learn from the data buy using training data to come up with a predictor function f that takes inputs and maps them to the labels. Our model will be useful if it works for data it doesn’t know.

We make operations on our data and output a prediction. Let’s assume we have a set of data x on which we perform operations to fit the predictor function f(x) = W*x + b ,(W is a set of parameters that we adjust during the learning process). We compute the output buy only guessing the parameters in the beginning. We then get the result y_hat . The ideal situation is that y_hat is equal to the actual output y. Our prediction would be correct and we would be very lucky. Having an idea on the error (e =y — y_hat ) we made can help us know how far we are from the actual result and then adjust our parameters to get closer to the right prediction. The error is computed for each input and then all errors are accumulated and passed through the loss function. A loss function quantifies how far we are from reality.

Some loss functions…

There is no universal loss function for all data that we may encounter. Nevertheless, here is how we could formulate as a general rule, the loss.

We can distinguish between regression loss functions(prediction of a real-valued quantity)and classification loss functions(predicting a label out of more than one).

Regression loss functions

Mean Square Error (MSE)

It measures the average of the squares of the differences between predictions and the actual outputs. It is always positive and the values closer to zero are the best.

Mean Absolute Error (MAE)

It measures the average of the absolute value of the differences between predictions and the actual outputs. It is the average distance between predictions and actual values. This loss function is also positive.

Classification loss functions

Cross Entropy Loss/ Log Loss (LL)

Cross Entropy Loss is a modification of the log-likelihood function. The output is a probability value between 0 and 1. y is the actual probability and p is the predicted probability. The Cross Entropy Loss increases when the predicted probability diverges from the real output. It is the commonly used loss function in classification problems.

Hinge Loss/Support Vector Machine Loss (SVMloss)

It is mainly used for “maximum-margin” classification, SVM. s denotes the score of the classifier for the different categories. It gets the correct scores to be greater than the incorrect scores with a margin of 1. It penalizes incorrect predictions (those that are within the margin are less penalized). This loss function is easy to compute but it is not very accurate.

A question we could ask is which loss function to use for our model. Well, it depends on some factors such as the choice of our model algorithm, time efficiency, confidence of prediction. Somehow it is more like a trade-off.

To sum up, loss functions give an idea on how our model performs. We use that information to minimize the errors of our model by adopting an optimization strategy like gradient descent.