A Primer on Multi-task Learning in NLP — Part 1

Original article was published by Neeraj varshney on Deep Learning on Medium


The motivation behind Multi-task Learning

Biologically, MTL can be seen as being inspired by the way we humans learn. For learning new tasks, we often apply the knowledge we have acquired from learning related tasks. Furthermore, we often learn tasks first that provide us with the necessary skills to master the more complex tasks. From the Machine Learning perspective, learning multiple related tasks leads to inductive bias which helps the models generalize better. In simple words, it is beneficial to learn multiple (related) tasks jointly so that the knowledge gained in one task can benefit other tasks.

A specialist model (trained via single-task learning) is laser-focused on a single task and requires large amounts of task-specific labeled data, which is not always available. MTL mitigates this requirement as it can leverage the knowledge, information from the related tasks and can do better than the specialist models in a limited-data scenario.

Furthermore, MTL profits from a regularization effect via alleviating overfitting to a specific task, thus making the learned representations universal across tasks. Finally, MTL also has advantages in terms of efficiency and storage requirements as a single model can handle multiple tasks. This makes production deployment much simpler.

Having mentioned the advantages of MTL, there are a few downsides to it as well. MTL can be extremely fragile and sensitive to both the selected tasks and the training process which leads to models that significantly underperform when compared to the counterpart single-task models.

To summarize, In the endeavor to build general-purpose AI systems, MTL has a pivotal role to play.