Original article was published on Artificial Intelligence on Medium
Today, chatbots are one of the most popular and promising tools or mediums for interaction between humans and machines. A chatbot is essentially a software (rather an Artificial Intelligence software) that can simulate a conversation between the user and itself in a natural language using messaging applications, mobile/web applications, telephone, etc. We interact with bots like these all the time. Often, we know that the person or entity we are interacting with is a bot like in case of virtual assistants like Alexa, or telephone support. But some bots are way too good at concealing their identity or imitating a human-like interaction. With the popularity of digital assistants, voice interfaces have become a mainstream technology. A lot of people find these assistants or bots very useful in completing everyday tasks.
In this tutorial, I will go over how we can define a simple conversational interface for our bot using Amazon Lex service. Moving forth in the coming tutorials, we will understand how to use AWS Lambda to implement the functional aspects of our bot, the creation of newer versions of the developed bot that allows for continued development, and gracefully handling the difficulties in understanding the users.
What is AWS and Amazon Lex?
Amazon Web Services or AWS is a subsidiary of Amazon that provides on-demand, secure cloud computing platforms, and APIs offering compute-power, data storage, content delivery, and other functionalities to individuals, companies, and governments, on a metered pay-as-you-go basis.
Amazon Lex is a service provided by AWS for building conversational interfaces into any web/mobile applications using text and voice. This is the service that powers the Amazon Alexa virtual assistant, or more commonly the Amazon Echo devices. Since this service is managed by AWS, there is no provisioning or scaling that needs to be done for it; AWS takes care of all that. All we need to do is to create the interface, and AWS will make it available to the user ready to be used. The two key features used by Amazon Echo or Lex service or any digital assistant for that matter are ASR and NLU. We will understand both of these better in the following section.
Understanding ASR and NLU
To understand ASR and NLU, we first need to know what is AI or Artificial Intelligence. Apart from being the new cool kid on the block, which everyone is talking about these days, AI is the intelligence demonstrated by machines, contrasting to the commonly accepted notion of associating intelligence with living creatures like humans and animals. This refers to machines or algorithms or software that can mimic the cognitive functions of the human brain, like learning, decision making, and problem-solving. A sub-field of AI, information engineering, and computer science is NLP or Natural Language Processing. NLP is concerned with the interactions between humans or users and machines, and how machines understand, process, analyze and provide output for data available in natural language.
NLU or Natural Language Understanding is a type of NLP that allows machines to understand text and speech in a natural language just like a human brain would understand it. ASR or Automatic Speech Recognition, on the other hand, is a sub-field of computer science and computational linguistics that concerns technologies enabling machines to recognize and transcribe spoken language into text. It is also known as speech recognition, computer speech recognition, or speech to text (STT). These two features have been used in conjunction and independently for years to develop digital assistants, transcription services, conversational interfaces, etc.
Understanding Lex Terminology
To build a chatbot using the AWS Lex service, we first need to understand and be familiar with the terminology. One might wonder why is it so important to spend quite a few minutes beforehand just going through the terminology, can’t I just wing it as I go about designing my bot? The answer is both, Yes and No. Yes, you can totally wing it and hop on to designing the interface part. It would in no way come as a hindrance to your bot functionality and design. However, not being familiar with the terminology and the particular characteristics of various offered features might leave your bot quite a few steps behind when it comes to User Experience. As it is true with any job or art form or a simple task, working with the knowledge of just what is “needed” or “required” will end up in a Minimum Viable Product at best. To make a product that is both “needed” and “desired”, it is important to know the finest details and how each of it adds up to the user experience.
Starting simple, I introduce here 5 important terms that we will come across over and over again while designing our conversational interface.
- Intent: In simple words, the intent is what we want our bot to do. When we interact with a bot or any conversational interface, we expect something in return; this could be anything like having a task performed or some information or something. Similarly, when we are building or designing a bot we want it to do something particular. Intent is what we expect as the end product of the interaction between the bot and the user. Intents are of two types: Custom Intent and Built-in Intent. Custom Intents are features designed and developed by the creator/developer of the bot, specific to that particular bot. Built-in Intents on the other hand are more generalized features designed, developed, and maintained by AWS/Amazon and are available to use in any Lex bots. One may or may not use built-in intents depending on the functional needs of the bot. Built-in intents usually look like Amazon.IntentName. An example of built-in intent is Amazon.CancelIntent. This intent when invoked stops the processing in the bot and throws away any parameter or value that the bot had gathered so far. It is not necessary to define all the intents at once. We can go on adding intents to our bot as the needs arise. Another way to look at intents is as features of a product.
- Utterances: Utterances are what we need to make our bot understand our intent or what we need from it. Intents are invoked using utterances. Imagine a conversation between you and your friend. Suppose you want to ask him/her which is their favorite Italian restaurant in the city? I alone can think of at least a dozen ways of asking this just in the English language. These dozen ways of asking for the best Italian restaurant in the city are each called an utterance. Now consider the number of languages in the world. In this mix, throw in cultural diversity, dialects, user background, language proficiency, sentence construction (order vs request, phrasing, etc.), and a lot of other factors. This is the cocktail that we try to feed our bot using utterances. Utterances are the various ways in which the user can request the same information or service from the bot. And a well-designed bot should be inclusive of all the various physiological and cultural diversities in its user base. The more the number of utterances provided for intent, the better it is. AWS has set a limit of 1500 (I think) to the number of utterances that might be provided for a particular intent. While 1500 is a huge number, anything between 8 to 12 utterances should be a good place to start with. The best way to go about it is to Think Out Of The Box. Let your mind wander and come up with as many different ways of asking the same thing as you can. Also, go to your users (or potential users), strike up a conversation with them, and notice how they phrase the same question/sentence. Also, ask your users of the many ways they can think of framing the sentence. Add this to your utterances sample. And, iterate. In the lifetime of your bot, it will get smarter with time and learn to react to newer utterances but it does not hurt to have a good sample to learn from.
- Slots In the real world, when you go to a bank to withdraw or deposit money form your account, you are supposed to fill out a form asking you for some information. Similarly, in more traditional digital applications, we are given forms to fill out whenever the application needs some information from us. Each of these forms, both in the real and digital world, are made of labels and inputs. Labels specify what information needs to be provided and input is a way to impose some restrictions on the type of information that can be provided for a particular label. This helps avoid errors. Now, what does a chatbot do when it needs to request some information from the user. With no visual or physical manifestation of a form template, are we at a loss? Well, not really. This is where Slots & Prompts come into the picture. Prompts act as (form) labels and are indicative of the requested information. These are parameters representing information requested from the user. Slots serve the purpose of validated inputs that the users can use to provide the bot with the requested information. Slots can be thought of as variables that are assigned values in the form of information provided by the user. Slots, like intents, are of two types: Custom Slots and Built-in Slots. Custom Slots can have validation like types of allowed values, value types, etc. Built-in slots look like Amazon.SlotName and are used as is without customization.
- Fulfillment: Fulfillment is what the bot aims at when any intent is invoked. An intent is said to be fulfilled when all the mandatory slots are assigned values by information collected from the user. Once an intent is fulfilled, the bot returns the desired information or appropriate output value.
We are now all set to begin designing our conversational interface using Amazon Lex. In the next tutorial, we will learn how to define intent, configure utterances, and define slots and prompts for our chatbot and have our conversation interface design ready. The chatbot we will design in the next tutorial will help us find a doctor given certain conditions. Working with a real-world example will give more substance and a better functional understanding of the aforementioned terms.
I am very thankful for Mike Ericson’s tutorial on Getting Started with Amazon Lex on Pluralsight which helped me understand all the nitty-gritty of building a bot using Amazon Lex and putting this tutorial together.