The Vanishing/Exploding Gradient Problem in Deep Neural Networks

Original article was published on Deep Learning on Medium


There are many approaches to addressing exploding and vanishing gradients; this section lists 3 approaches that you can use.

  1. Reducing the amount of Layers

This is the solution could be used in both, scenarios (exploding and vanishing gradient). However, by reducing the amount of layers in our network, we give up some of our models complexity, since having more layers makes the networks more capable of representing complex mappings.

2. Gradient Clipping (Exploding Gradients)

Checking for and limiting the size of the gradients whilst our model trains is another solution. Going into the details of this technique is beyond the scope of this article, but you can read more about gradient clipping in an article by Wanshun Wong titled What is Gradient Clipping.

3. Weight Initialization

A more careful initialization choice of the random initialization for your network tends to be a partial solution, since it does not solve the problem completely. Check out this article by James DellingerWeight Initialization in Neural Networks: A journey from the basics to Kaiming


In this post we learnt what exploding and vanishing gradients are, how to detect them and some solutions. I am aware in this article I did not go into much detail about the RNN structure which are prone to vanishing gradients, useful resources to learn more about that will be linked below.

Other Resources

Chi-Feng WangThe Vanishing Gradient problem

Eniola AleseThe curious case of the vanishing & exploding gradient