Source: Deep Learning on Medium
In sequence learning problems, we know that the true output at timestep ‘t’ is dependent on all the inputs that the model has seen up to the time step ‘t’. Since we don’t know the true relationship, we need to come up with an approximation such that the function would depend on all the previous inputs.
The key thing to note here is that the task is not changing for every timestep, whether we are predicting the next character or tagging the part of speech of a word. The input to the function is changing at every time step because for longer sentences the function needs to keep track of the larger set of words.
In other words, we need to define a function that has these characteristics:
- Ensure that the output Yt is dependent on previous inputs
- Ensure that the function can deal with a variable number of inputs
- Ensure that the function executed at each time step is the same.
Recurrent Neural Networks
Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step.
In RNN, you can see that the output of the first time step is fed as input along with the original input to the next time step.
The input to the function is denoted in orange color and represented as an xᵢ. The weights associated with the input is denoted using a vector U and the hidden representation (sᵢ) of the word is computed as a function of the output of the previous time step and current input along with bias. The output of the hidden represented (sᵢ) is given by the following equation,
Once we compute the hidden representation of the input, the final output (yᵢ) from the network is a softmax function of hidden representation and weights associated with it along with the bias. We are able to come up with an approximate function that is able to satisfy all the three conditions that we have set to solve the problems of sequence learning.