Learning Rate in Deep Learning

Source: Deep Learning on Medium

Learning Rate in Deep Learning

How to optimize learning rate in Deep Learning to train the model much faster ?

Learning Rate is a hyperparameter in deep learning which you tune to control how fast or slow the training of the algorithm happens. This learning rate is used to update the parameters (weights and biases) which we are trying to optimize.

In this article I will be talking about how you can slowly reduce the learning rate of your training algorithm over time. This is known as Learning Rate decay.

Now why this might help. Say you are training your neural network with a Gradient Descent algorithm (in back-propagation). If you have a constant learning rate, then even when you are close to the “global minima” you will take longer steps to learn and in the process deviate further from the minima. However if you keep slowly reducing the learning rate (alpha) then when you are close to the minima, you only move around in a tight spot and eventually converge at the minima. This is the intuition behind Learning Rate Decay.

Implementation of Learning Rate Decay —

a. Linear Decay of the learning rate —

b. Other Decays of the Learning Rate —

So, when you implement a Deep Learning model, you can decay the learning rate according to these intuitions.

Sources

  1. Deep Learning Specialization by Andrew Ng & team.