Original article was published on Deep Learning on Medium
Long Short Term Memory Network LSTM in short are a kind of RNN capable of learning long term dependencies. Mostly used for solving time-series data, they are capable of learning the patterns from the previous inputs and predicting future.
In Neural Network features are learned from data. LSTM also does the same. Designing the LSTM layer might be difficult some time. While designing it for the first time, we may stuck in choosing the right number of layers, sizes etc. While using LSTM for stock price prediction I really got difficult in designing it. Sometime my model used to overfit and sometime it underfit.
Designing a good model usually involves a lot of trial and error. I have written here, from my experience in designing a LSTM for the first time and how did I made it.
So how do we design a model???
Answer: We don’t.
We pick something that works in similar to our model and use it. But this approach works in many cases but not applicable in all cases especially like if we are working on a unique problem or doing bleeding-edge research.
We can borrow ideas from successful models while designing our own model.
How do we choose many layers and units??
Answer: In most of the time, beginning with a small model and gradually increasing the model size will be easier for us to build. By increasing the model size means adding layers and increasing the number of units.
Yeah, we can start with a large model and go on decreasing but how big do we start?
It is usually better to go deeper than taller in the sense of a model because its very hard to optimize a very tall and skinny network.
How to choose the activation function?
Answer: Choosing Relu except for the output layer might be easier.
And what about regularization?
Answer: Mostly, we should use L2 regularization at first and start with it.
Answer: In most of the other blogs, they have written that batch size of 32 is good in a neural network but if the gradient is too noisy, we need to use a large number of batches.
If we get stuck in local minima or out of memory then we use smaller batches.
Answer: One epoch is when an entire dataset is passed forward and backward through a neural network. On choosing epoch, we can select a large number of epoch i.e. >>100 and use early stopping.
While using early stopping, we can stop the model and prevent from overfit. Early stopping are used before fitting a model through epoch.
These are only some of the ways for designing a LSTM network for the first time and stuck in starting with it. There can be many other ways for designing a network and these are only from my experience.
Thanks for reading.