Does your AI system ‘know’ what it does not know !

Source: Deep Learning on Medium

Does your AI system ‘know’ what it does not know !

Question answering on unstructured data is considered a task, worthy of evaluating even a human learning a new language, and expected to be tough for AI systems to do so, with high precision. There has been a lot of progress in the field lately, the recent state of art algorithms have shown to be more accurate than human performance on some specific datasets like Squad.

While these algorithms are effective, it has been observed that most of these rely on superficial information like local context similarity, global term frequency etc to extract answer from the documents. These systems are vulnerable against scenarios where there are little-to-no lexical overlap between the query and the document , multi-sentence comprehension , and many other simple distortions .

Squad 2.0 dataset appeared as a welcome change to encourage researchers to come up with models which would not just predict the span of the answer given a query and document, but also evaluate the answerability of a query against the specified document . I’ll still refrain from saying that the models which have performed well on squad 2.0 can understand the underlying narrative rather than relying on shallow pattern matching or salience, but it definitely has paved the way towards smarter reading comprehension models.

AI machine comprehension has great potential applications in developing intelligent agents . Consider a system which can process dozens of documents, and answer to your any query you ask related to their content. One of the key challenges such a system would have , is to know what it does not know ! Most systems rely on the confidence of span prediction to tackle answerability , when the span is predicted with the probability below threshold it is considered non-answerable , something which does not really perform well.

let’s look at how a QnA system built specifically to infer answerability, can help avoid false extractions.

We have built a system which given a text piece and a query , it can evaluate if the query can be answered using the text or not . Consider some of the examples which demonstrates the effectiveness of the system.

Statement ‘ in each of the examples below, is the text piece extracted as one of the potential candidate which may/ may not have the answer to the query . ‘query’ in each of the example , is the user query against the document.
The score below, is the prediction of answerability of the query given the statement . Examples are self-explanatory , scores below 0.5 are to be considered non-answerable , scores above 0.5 means the query can be answered given the context .