Vanishing gradient and exploding gradient in Neural networks

Source: Deep Learning on Medium


Go to the profile of Arun Mohan

Vanishing gradient Problem

Vanishing gradient problem is a common problem that we face while training deep neural networks.Gradients of neural networks are found during back propagation.

Generally, adding more hidden layers will make the network able to learn more complex arbitrary functions, and thus do a better job in predicting future outcomes. This is where Deep Learning is making a big difference.

Now during back-propagation i.e moving backward in the Network and calculating gradients, it tends to get smaller and smaller as we keep on moving backward in the Network. This means that the neurons in the Earlier layers learn very slowly as compared to the neurons in the later layers in the Hierarchy. The Earlier layers in the network are slowest to train.This is an issue with deep neural networks with large number of hidden layers. Ie, We know updated weight

W_new = W_old — η gradient

For earlier layers this gradient will be very small. So there will be no significant difference between W_new and W_old. This arises the problem of vanishing gradient descent.This is the problem mainly with sigmoid and tanh functions.So we use RELU based activation functions in training a Deep Neural Network Model to avoid such complications and improve the accuracy .

Exploding gradient Problem

We have discussed about vanishing gradient problem.Now we will get in to exploding gradient problem.Earlier we discussed what happens when our gradient becomes very small.Now we will discuss what will happen if it gets large.

In deep networks or recurrent neural networks, error gradients can accumulate during an update and result in very large gradients. These in turn result in large updates to the network weights, and in turn, an unstable network.The explosion occurs through exponential growth by repeatedly multiplying gradients through the network layers that have values larger than 1.0.This will ultimately led to an total unstable network.

We can understand exploding gradient problem by following techniques:

  • We have poor loss on training data
  • The change is loss will be large from update to update

Some techniques to overcome exploding gradient descent are:

  • Use LSTM in case of RNN where we constantly face exploding gradient problem
  • Use weight regularization.