Long Short Term Memory Maths — Part 2

Original article was published by Vidisha Jitani on Artificial Intelligence on Medium

Backward Propagation

As the goal of backward propagation is — how to tune our variables to decrease the Error.

Also, I’ve purposely removed biases from the equations to simplify things. Also, for the ease of calculations I’m only considering one input based LSTM.

Now, as we have explained earlier, via chain rule and basic differential calculus, we should be able to differentiate them.

CAUTION: The idea behind these equations is just to get an intuition of chain rule and to figure out how to calculate the derivatives. There might be some mathematical errors here (Since, I’m just a software engineer and not a maths pro). Please do not copy them directly. These are not copy pasted from any other site but computed manually. Feel free to comment below, if there are some issues in it.

And thus these would be final weights.