Original article was published by Jae Duk Seo on Artificial Intelligence on Medium
So making generation text much better, not just use the input but also give some context. And the generated text can be adapted to different applications such as classification and more. Some of the applications are fact-checker etc…
And the model they used is pretty interesting and different. Quite a complex architecture but this does give superior results as well as open-source!
We can generate text via using the masked-token, we can do this via the RNN method or auto-regressive way. The idea behind this paper is adding context, so rather than having a sequence we are also going to incorporate different things, such as documents.
The high-level overview of how this model works, some of the words from the wiki is going to be encoded, and the output is also going to be paired with the input.
Sentence BERT architecture is used.
Siamese Network, two-tower architecture, the idea is to compare different documents. Very interesting way of training an NLP model, this can also be applied in vision as well. (I guess this is for comparing document in scale.).
Given the previous outputs, we are going to generate new tokens. (super cool). Either the model can start to predict word by word or encode the sentence as a whole and start to predict.
They are able to leverage their research by using the pre-trained models provided from hugging face. And the applications for this model are open-end questions. (any kind of questions can be asked).
And the rest of the video is just researcher flex, how they became SOTA. And future works as using longer sequences.