[論文解讀]SpanBERT: Improving Pre-training by Representing and Predicting Spans

Source: Deep Learning on Medium

首先,我們先來看看 Google 曾經提出的BERT模型。

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

接著FB於2019/07提出了新的架構SpanBERT,大幅提升了BERT的 performance。原本BERT主要在解兩種問題:Task 1「Masked Language Modeling (MLM)」;Task 2「 Next Sentence Prediction (NSP)」,後來FBresearch認為BERT採用的Bi-directional方法會拉低NSP任務的performance,若僅使用Single-Sequence Training即可達到不錯的效能,因此SpanBERT中並沒有Bi-directional的思維(筆者教授提及Bi-directional不一定就是不好的方法,或許只是使用不當,沒有充分發揮其效果而已)。