Source: Deep Learning on Medium
Frequently Asked Questions
Q: Since we pass the whole ground truth sequence through the RNN model, is it possible for the model to “cheat” by simply memorizing the ground truth?
A: No. At timestep t, the input of the model is the ground truth at timestep t – 1, and the hidden states of the model have been updated by ground truths from timestep 1 to t – 2. The model can never peek into the future.
Q: Is it necessary to update the loss at each timestep?
A: No. An alternative approach is to store the predictions at all timesteps in, say, a Python list, and then compute all the losses in one go.
Q: Is Teacher Forcing used outside Natural Language Processing?
A: Yes. It can be used in any model that output sequences, e.g. in time series forecasting.
Q: Is Teacher Forcing used outside Recurrent Neural Networks?
A: Yes. It is used in other autoregressive models such as Transformer.