What is Teacher Forcing?

Source: Deep Learning on Medium

Frequently Asked Questions

Q: Since we pass the whole ground truth sequence through the RNN model, is it possible for the model to “cheat” by simply memorizing the ground truth?

A: No. At timestep t, the input of the model is the ground truth at timestep t – 1, and the hidden states of the model have been updated by ground truths from timestep 1 to t – 2. The model can never peek into the future.

Q: Is it necessary to update the loss at each timestep?

A: No. An alternative approach is to store the predictions at all timesteps in, say, a Python list, and then compute all the losses in one go.

Q: Is Teacher Forcing used outside Natural Language Processing?

A: Yes. It can be used in any model that output sequences, e.g. in time series forecasting.

Q: Is Teacher Forcing used outside Recurrent Neural Networks?

A: Yes. It is used in other autoregressive models such as Transformer.