Long short-time memory (LSTM) in one picture

Source: Deep Learning on Medium

Long short-time memory (LSTM) in one picture

Nothing more to say

I first approached LSTM networks about three years ago. I was developing some piece of code to make a seq2seq model work (perhaps a language translation model… some NLP task for sure). LSTMs performance did not impress me much, they were so slow and cumbersome. I remember I added more layers just to… to get even worse results! In short, I do not like LSTM architectures, I always thought LSTMs would be something like “one day we’ll look back on this and laugh” (I found a page showing that I am not the only one who dislikes these tools).

Some notation:

Plain LSTM architecture (full size version here).

Note. The picture shows no connection going out from the cell to a possible additional LSTM layer (the connection is usually represented by an oriented segment going upward), it is understood that one can send a copy of h at time t to a further LSTM layer.