Deep Learning: Regularization Techniques to Reduce Overfitting

Original article was published on Deep Learning on Medium

So, What is Regularization?

In the context of Machine Learning,

Regularization basically overrules or discourages the learning more complex or flexible models.

Deep Learning models have so much flexibility and capacity that Overfitting can be a severe problem if the training dataset is not big enough. Sure it does well on the training set, but the learned network doesn’t generalize to new examples that it has never seen.

There are many regularization techniques which can help to overcome overfishing. They are:

1. L2 Regularization

The standard way to avoid Overfitting is called L2 Regularization. It consists of appropriately modifying your cost function, from:

Cost function showing with the L2 regularization part
  • The value of λ is a hyperparameter that you can tune using a dev set.
  • L2 Regularization makes your decision boundary smoother. If λ is too large, it is also possible to “oversmooth”, resulting in a model with high bias.

What is L2-regularization actually doing?:

L2-regularization relies on the assumption that a model with small weights is more straightforward than a model with large weights. Thus, by penalizing the square values of the weights in the cost function, you drive all the weights to smaller values. It becomes too costly for the cost to have large weights! This leads to a smoother model in which the output changes more slowly as the input changes.

Another method of Regularization is Dropout, let’s see what it does to optimize our model.

2. Dropout

Dropout is a widely used regularization technique that is specific to deep learning. It randomly shuts down some neurons in each iteration.

Watch these two images to see what this means:

Comparison b/w Standard Neural Net and Dropped Neural Net
Dropout Network showing the neurons get dropped from layers

When you shut some neurons down, you actually modify your model. The idea behind dropout is that at each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.


  • A common mistake when using dropout is to use it both in training and testing. You should use dropout (randomly eliminate nodes) only in training.

What you should remember about dropout:

  • Dropout is a regularization technique.
  • You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
  • Apply dropout both during forward and backward propagation.

Apart from L2 and Dropout method, there are more techniques through which you can regularize your model.

Data Augmentation Technique, whether you all might know of this or not but it is a simple data addition training through which we could increase the diversity of data available for training models, without actually collecting new data.

A plot of Augmented Images with a Horizontal Shift

Data augmentation techniques such as cropping, padding, and horizontal flipping have commonly used to train large neural networks.

If you want to dive deeper into this technique, feel free to check this below link, a paper on data augmentation:

Last but not least, there is one more method which I want you to know, and this is Early Stopping.

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent.

Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit.

That’s all for this basic intuitive about Regularization