Microsoft Research Wants You to Use Natural Language to Access Web APIs


Is natural language the ultimate protocol for application program interfaces(APIs)? The idea seems intriguing. Web APIs have become an omnipresent part of modern software architectures. Conceptually, each API defines a protocol and a semantic model to access and interpret data respectively. Over the years, universal protocols have been developed to abstract common capabilities of APIs. For instance, GraphQL provides a generic protocol to access data via a Web API. However, those semi-generic protocols still rely on individual syntax and semantics to encode and process data. With the raise in popularity of conversational interfaces, the idea of a using natural language to interact with Web APIs have been gaining popularity. Recently, a group of artificial intelligence(AI) researchers from Microsoft published a research paper that proposes a clever architecture to develop natural language interfaces for Web APIs.

The idea of creating a conversational protocol for Web API is certainly interesting but is not without challenges. The protocol of a Web API establishes a constrained structure to interact with resources. However, the same API call can be expressed in infinite ways using natural language. If to that, we add the fact that APIs typically use parameters to customize a specific action then the challenge is even worse as the combinatorial variations of parameters have different representations in natural language. For instance, consider a scenario in which we are using CRM API to retrieve the information about specific accounts. Natural language expressions such as “Who is the contact for account A?”, “Could you find me the contact for account A?”, “Who represents account A?” are all representations of the same API call.

The second challenge with conversational interfaces for Web APIs is related to the supervised nature of natural language processing(NLP) models. In order to train a natural language interface for Web APIs, a system will need access to high quality labeled data about those APIs which is not very easy to find.

NL2API: A Framework for Natural Language Interfaces for Web APIs

In their research, the Microsoft team introduced a framework called NL2API that uses deep neural networks to infer API calls from natural language sentences. The core architecture of the NL2API framework is based on encoder-decoder models with a small twist based on decomposing the decoder into multiple interpretable components called modules. Each module specializes in predicting a pre-defined kind of output, for example, instantiating a specific parameter by reading the input utterance in NL2API.

Architecturally, modules are NL2API’s modules are specialized neural networks specialized on perform a specific task based on a specific set of parameters. In our CRM example, suppose that we are processing an utterance like “Give me the accounts with revenues over $1M in the last year and group them by city”. In that example, commands such as GET(Accounts), FILTER(Revenue > $1M) or GROUPBY(City) can all be considered individual modules. In simple terms, instead of using a single decoder for processing the whole sentence such as traditional encoder-decoder architectures, the NL2API model will use different decoders for predicting specific parameters which helps to improve the semantic richness of the model.

Another important component of the NL2API framework is the controller which is responsible for determining what modules will be triggered for a specific utterance. Specifically, the controller is also implemented as an attentive decoder. Using the encoding of the utterance as input, it generates a sequence of modules, called the layout. The modules then generate their respective parameters and finally the parameters are composed to form the final API call.

Training NL2API

At the beginning of this article, I mentioned that one of the biggest challenge with developing natural language interfaces for Web APIs is the lack of high quality labeled data. The modular approach followed proposed by Microsoft’s NL2API also helps with this challenge. Given a specific Web API, NL2API first generates a series of sample calls and decomposes them into canonical modules using a simple grammar. After that the system uses a crowdsource model to paraphrase the specific commands.

The crowdsourcing approach to training is clever but certainly not economic as the combinatorial explosion of parameters of any API makes it almost impossible to annotate. To address this challenge, NL2API uses a hierarchical probabilistic model for the crowdsourcing process, which provides information to later decide which API calls to annotate. The NL2API calls this approach the Semantic Mesh as its computationally represented as a mesh connecting the possible API calls/parameter combinations. Semantic mesh gives a holistic view of the whole API call space as well as the interplay of utterances and API calls, based on which trainers can selectively annotate only a subset of high-value API calls. In the initial testing, the Sequence Mesh outperformed more traditional training models such as Seq2Seq.

The Microsoft Research team, tested NL2API by generating natural language interfaces for the popular Microsoft Graph API suite. The results validated that the NL2API model could be a more viable approach to enable the first generation of natural language interfaces for Web APIs.

Source: Deep Learning on Medium