Original article was published on Deep Learning on Medium
Let’s know how the hidden state is calculated:
The previous hidden state is combined with the current value, to form a vector. This vector has information on the current input and the previous inputs. This vector passes through the activation and the new hidden state is formed. The activation function used here is tan function because it keeps all the values between negative one and positive one.
When the training process is completed the RNN is able to assign weights to every input feature by itself.
The feature here is the individual texts in the sentence. According to gradient descent, RNN determines what information is to be passed to the feedback loop. During backpropagation, RNN suffers from vanishing gradient problem. This creates a short term memory. To mitigate this short term memory, two specialized RNN where developed. They have internal mechanisms called gates. This can regulate the flow of information. It only maintains the relevant information to make predictions.
Long Short Term Memory (LSTM)
LSTM has feedback connections. It has forget the gate, input gate, and output gate. These gates learn which information is relevant to forget or remember during the training process. The gates contain a sigmoid activation function. It has cell states, which act as a transport highway that transports relative information all the way down to a sequence chain. It acts as the memory of the network. This helps it to process the entire sequence of data. Therefore it helps it to have a longer memory than usual.
Gated Recurrent Unit (GRU)
GRU is similar to LSTM but has fewer parameters and it performs well on less frequent datasets. It uses hidden state to transfer information to transfer information instead of cell state. It has reset gate and an update gate. The reset gate decides how much pass information to forget. And the update gate decides the information to be thrown away or new information to add.
Applications of RNN:
- Sentimental Analysis
- Text analysis
- Speech Recognition
- Language translation
- Video analysis
- Stock prediction
Why RNN is Different from CNN?
- Convolutional, pooling, and fully-connected are the three layers on CNN. These layers help in the transformation of the data to produce the prediction. RNN uses the looping mechanism in analyzing the input data to produce output.
- In CNN, every hidden layer has separate functions to perform. In RNN, the output depends on the looping mechanism in hidden layers.
- CNN is most suited for image classification. On the other hand, RNN is suited for sequential text data.