In the previous post, we’ve seen some benchmarks comparing our recently developed NLU technology against RASA and Snips alternatives. Obviously building conversational technology takes more than just a good NLU intent-detection system, but is such an integral part of it that we continued to rank it against more alternatives.
As before, we are using a collection of three open corpora suited for the evaluation of conversation interfaces. As per training and benchmarking procedures, we replicated the methodology in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems, and the results for the commercial NLU alternatives are extracted from the same study.
Here are the results for the three data-sets and all the different services¹:
We are delighted to see our NLU score above all competitors in all metrics in two out of three data-sets!
This is obviously a quick overview of performance for intent-detection, but more in-depth analysis are coming!
This is using our raw models without any improvement technique. We are looking forward to challenging ourselves and see how much we can boost the performance of these models using a whole range of machine learning optimization tricks.
We also have more data-sets for intent-detection and for slot-filling (You don’t know what is slot-filling? Then you should definitely check out ‘part 3’ of our NLU benchmarks series! Hold tight, is coming soon!)
 LUIS, API.ai and Watson metrics were obtained in 2017 by Braun et al, published in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems as part of SIGDIAL 2017 proceedings.
Source: Deep Learning on Medium