how to compress a model by knowledge distillation

Source: Deep Learning on Medium

Model compression is important to deploy a BERT-like model in production. One of the compression strategies is knowledge distillation.

Continue reading on Medium »