Problem: Overfitting, Solution: Regularization

Original article can be found here (source): Artificial Intelligence on Medium

Problem: Overfitting, Solution: Regularization

What makes a model overfitting and how we can solve this issue

We all have those friends who tell stories in excruciating detail. When you ask them about a movie they saw recently, you may end up learning not only about the movie but also the watch of the guy selling popcorn at movie theater. On the other hand, we have uncommunicative friends who would just tell you that the movie was “good” or “bad”.

Overfitting and underfitting are similar to those different types of friends in terms of the amount of detail. An overfit machine learning model tries to pick up details on training data whereas an underfit one is too general and tends to miss important trends in training data.

A more machine-learning-like example would be predicting the object in an image. Assume we are trying to build a model that predicts tomatoes in images:

• Model A: Red, circle, green star shape on top, a few water droplets
• Model B: Red, circle

The problem with model A is that not all tomatoes have water droplets on them. This model is too specific and likely to pick wet tomatoes. It is not generalized well to all tomates. It will look for water droplets so cannot predict dry tomatoes in an image. It is overfitting.

On the other hand, model B thinks everything that is red and has circle shape is a tomato which is not true. This model is too general, not able to detect critical features of tomatoes. It is underfitting.

These examples are not exactly how a machine learning model learns and predicts but give an overview of overfitting and underfitting. In this post, I will cover the following concepts in detail:

• Bias and variance
• Overfitting and underfitting
• Regularization

Bias and Variance

Bias and variance are essential to understand overfitting and underfitting.

Bias is a measure of how far the average prediction is away from the real values. Bias arises if we try to model a complex relation with a very simple model. The predictions of a model with high bias are very similar. Since it is not sensitive to the variations within data, the accuracy of the model is very low on both training data and test data (previously unseen data).

The blue dots are the observations in the training set and the red line is our biased model which does not care the fluctuations within the two features of observations and does the same prediction.

Variance is the opposite of bias in terms of being sensitive to the changes within data. A model with high variance is highly sensitive to even small changes within training data. It tries to pick up every small detail and thus very small changes on training data also changes the model. Models with high variance also tend to capture noise in data. An outlier would be in the scope of the model.

As you can see, the model tries to adjust according to all variations within data. Predictions of a model with high variance are widely spread out. It is clear that this model has a very high accuracy on training set. Howevet, it will perform poorly on new, previously unseen observations.

Machine learning models are built to work on previously unseen observations so models with high variances are not acceptible. We can also never rely on models with high bias. Therefore, it is crucial to find the line between bias and variance. There is always a trade-off between bias and variance. We can easily find a model with high bias and low variance or low bias and high variance. However, the success of your model depends on finding the optimal way in between bias and variance. For example, the model below seems like a good fit.