Source: Deep Learning on Medium
Stanford has built the SQUAD and SQUAD2.0 datasets for this task. The latter is comprised of ~19k paragraphs each with multiple questions. There are roughly 130k questions total, some of which cannot be answered with the given context, by design. Purportedly, this is to allow training of systems that can admit that they don’t know the answer.
The team at huggingface has created high quality implementations for these BERT-based architectures. Moreover, they provide pre-trained models that have been fine-tuned on different datasets, like SQUAD. They also provide some high-level wrappers that make integrating these models into your project ridiculously easy. Here’s a little example:
from transformers import pipeline
qapipe = pipeline('question-answering')
'question': """how can question answering service produce answers""",
'context': """One such task is reading comprehension. Given a passage of text, we can ask questions about the passage that can be answered by referencing short excerpts from the text. For instance, if we were to ask about this paragraph, "how can a question be answered in a reading comprehension task" ..."""
'answer': 'referencing short excerpts from the text.'}
An import and two lines of code. That’s it.
While there are a lot of common words between the question and the context, notice how the subject, structure and order of the sentences differ. I’m not saying that shallow learning techniques based on linguistic theory aren’t capable of handling this specific case. However with all the possible subtleties and variations previous methods have not been able to perform close to the level of average humans. That has changed in just the last year with advances in BERT-based architectures. Numerous teams around the world have produced models that match or exceed human-level accuracy.