Source: Deep Learning on Medium
Let’s make an AI ‘FAQ’ Chat-bot — Powered by Neural Network
The popularity of chatbots is certainly on the rise. Not so long ago, the FAQ sections on any business website was a dull, dark, dungeon-esque place where static information was often lying on a pile for you to make sense of. Gone are those days. The age of AI is almost upon us.
More and more businesses are shifting their FAQ section from ‘self-assisted’ to ‘bot assisted’ highly interactive experience. Although not cent percent accurate, the bots are managing to answer the queries with a respectable amount of accuracy. And with the increasing research in this field, they’re only going to get better. Trend analysts are predicting their heavy usage and market dominance in the times to come, and there’s no reason to think otherwise.
In 1950, Alan Turing, widely regarded as the father of theoretical computer science and artificial intelligence, proposed the Turing test, a way to check any software’s ability to exhibit intelligent behavior, one that is akin to, and indistinguishable from the human behavior. In their current form and shape, the chat-bots are far away from clearing the Turing test but they are getting smarter every day, and they might surprise us.
Okay, enough of the build-up and the pep talk about their abilities, let’s jump right in to see how to make one. But just so we are on the topic of Turing test I want to throw in one last bit before we start, a little trivia that I kind of found it cool. The CAPTCHA, as we know it, is also a sort of Turing test. Here’s its full form, Completely Automated Public Turing test to tell Computers and Humans Apart”. But in it, a computer is evaluating whether you are a human, and not the other way round. Strange, isn’t it?
Anyways, with that being said, let’s begin. There are various kinds of chat-bots in the market, many supremely sophisticated like Google Assistant or Amazon Alexa. But the kinds that usually needed for the FAQ section need not be that sophisticated because their domain is defined. They are built for a specific purpose. They will only entertain the questions in the topics that they are trained to answer. But the key is the questions are NOT hard-coded. The users are free to type in the question in whichever form they wish to, as long it is semantically the same.
For instance, to get information about the working hours of any business, two persons can ask the same question in two different ways,
Person 1: What time are you open?
Person 2: What are your working hours?
Notice that both these questions are the same, just framed differently, with a different choice of words. The FAQ chat-bot should interpret them as the same, and prompt the same reply.
Additionally, a question can be asked in any grammatical order or framed with the help of any kind Auxiliary words, the chatbot has to consider all that, and find the main words from the sentence, to know the real meaning behind what is asked for.
Person 1: What’s there on the menu?
Person 2: Could you show me your menu?
Person 3: Where should I go to check your menu?
Notice here, there are many auxiliary words but the main word is “Menu”. Those are the words the chat-bot should pay attention to and spot them.
So, even though, the bot is not required to answer everything under the sun, tackling the range of questions is by no mean a walk in the park. That’s where the power of artificial intelligence comes in.
We talked about the functional aspects, now let’s cover the technical ones now. I’ve uploaded the code on my GitHub account in case you want to refer it side by side.
Generating Training Data
Right then, the first thing we need to do is to prepare the training data. The heart of machine learning and AI is to train the model with the wide range of training data set so the model learns from it, and makes predictions based on this learning on the next set of data it has never seen before. In order to achieve this, I have created a training.json file where I have classified probable “intents” of the user behind posting a question.
I am pretending to make a bot for a Pizza shop. So their customers can post a “greeting” message, a “goodbye” message, a question for the “menu”, asks for the “pricing”, commands to book an “order”, ask for the “name” of the operator, and so on. Before starting any work, we need to think of all possible intents, a visitor might have on our chatting portal.
Once we have the intent list, then we can add possible patterns of the queries/questions he might pose here. Say for a greeting, there are many possible ways to greet someone, different people will say different things like “hello, hi, what’s up, good morning etc”, we need to put as many as we can think of here. Mind you, the bot is not restricted to only these patterns. These are just the training data set. Based on these set of words, the bot will also catch the words which are not there in the list but have the same meaning and classify them as the “greeting” message. For example, word “hey” was not in the greeting list, but the bot will catch that one.
Post that, then we need to put in the responses. These are the sentences that the bot will reply back when he sees the message in any particular intent. To make the user feel the bot is making a different “greeting” reply every time, we can put a range of replies and choose one randomly.
Once we have our training data set ready, we then have to import certain python modules (or install if you don’t have them already). We would need,
Json: To parse our json training data set.
Nltk: Natural language tool kit (to stem the words which we’ll talk about below)
Random: To make a random choice of responses
Tensorflow: Neural network implementation
Tflearn: A deep learning library that eases working with TensorFlow
Numpy: Array management
Pickle: Object serialization to cache the pre-processed data.
The python script has three steps,
- Data pre-processing
- Training the neural network
- Chatting with the user
Step 1: Data pre-processing
In this section, we need to do a bunch of things on the data so that it becomes ideal to feed into the neural network for deep learning.
Read the Json training data. Collect all the words across all patterns and tags together in a single list. We’ll call it our vocabulary list. It helps us gauge the size of the input data we’re playing with.
It is the functionality of the natural language processing kit (nltk), to bring any word to its most root form. For example when we say “Whats’s”, the main word is “What”. Stemming removes all extra characters and punctuation marks, and make it trimmed. We don’t want to feed our model with any unnecessary words to complicate the matter. So, this process is crucial to keep the model clean.
One hot Encoding
The neural network only understands numbers. The stemmed vocabulary word list that we have gathered makes no sense to the underneath network. So, we will have to convert the word list into a bunch of numbers. This process is called one-hot encoding. It simply means if a word exists, encode it with 1 (hot bit), and if it doesn’t exist, encode it with 0 (cold bit). We need to generate this encoding pattern for every corresponding query pattern.
For example, let’s say we have a total vocabulary of 6 words.
[“hello”, “good”, “price”, “morning”, “age”, “name”]
And the pattern (remember the patterns are the queries or the questions asked by the user) is “Good morning”, then we will encode it like,
[0, 1, 0, 1, 0, 0]
Placing 1’s at places in the vocabulary position where individual words of the pattern are present.
Bag of Words
If we repeat the one-hot encoding for every pattern of our training data, we would get many individual encoded lists. We need to put it in a bag of words (another machine learning terminology). This bag represents the absence of any order. Once we put items in the bag, we are not sure in which order they will be placed inside. The position is purely arbitrary.
In the same sense, the neural network doesn’t really care in which order the word appeared. All it matters is that the word exists or no. This is the common approach taken while classifying documents based on their content.
Note: We’ll have to convert the output too into a bag of words. The output in our case is the intent of the user behind posting a query. We are essentially trying to classify the intent behind what is typed by the user. If he types “Confirm my order”, his intent is to “order”. If he types “See you later”, his intent is “goodbye”, and so on.
So, if we have say five intents,
[“greeting”, “goodbye”, “order”, “cost”, “menu”]
Then, for every pattern, we need to encode the corresponding output as follows, say for pattern, “how much for this pizza?”, we know this is the query to ask the ‘cost’ of the pizza, so we will have to encode its matching output as,
[0, 0, 0, 1, 0]
Note here, the pattern was for intent “cost”, hence we placed 1 at that location in our intent list.
This process of encoding needs to repeat for every pattern and its corresponding output.
Caching of pre-processed data
Finally, the last item in the data pre-processing step is to cache the pre-processed data, so that we don’t have to repeat the entire process every-time a customer chats with our bot.
To do that we’ll use the pickle module. The stored objects (vocabulary list, a bag of words of input patterns, a bag of words of output intents) etc are serialized and stored in a file, which we’ll reuse.
Step 2: Training the Model
Right then, everything is set to feed some nutritious fodder to those hungry neurons. But first, let’s try and understand how Neural Network works.
The biological neurons are the basic working unit of the brain, a specialized cell designed to transmit information to other nerve cells. This communication forms our nervous system (the biological neural network) that essentially makes our brain does all those wonderful things.
The artificial neural network is designed on the same principles. A neural network is fundamentally a network of neurons arranged in layers. Neurons at every layer are assigned a specific computational task and based on the result of this computation, their job is to activate the neurons of the next layer. Super confusing right? Let’s take an example (the hello world of the neural network, the classification of hand-written digits).
In this example, our task is to design a neural network that could recognize the handwritten digits. The input system that we have is a 28 * 28 pixel grid where we draw our image, forming the 784 different pixel values. Depending on what you draw on the input frame, some of those 784 pixels will be brightened.
In our neural network, we have designed 4 layers. The first is the input layer, the next 2 are hidden layers, and the last is the output layer. The neurons at every layer are connected to every neuron in the next layer.
The first neural layer, the input layer, will have 784 neurons whose value is the intrinsic brightness of the pixel they represent. The value can range from 0 (completely dark) to 1 (completely bright).
The next layer (2nd layer) is a hidden layer, we have designed has 16 neurons. Every neuron in this hidden layer will have a dedicated region assigned to it. The task of neurons in this layer is to recognize edges in the specific region. If the brightness of the pixels in its dedicated region goes above a certain threshold, the neuron of that region will be activated.
The next layer (3rd layer) is also a hidden layer with 16 neurons. Every neuron in this layer is assigned the task to detect patterns based on the edges’ input. If there is a pattern like a circle in the upper half of the image, a specific neuron in this layer will be activated.
The last layer is the output layer, which contains 10 neurons, representing 10 digits that we want to classify our image on. Based on the inputs of the patterns from the 3rd layer, a specific neuron will be activated in this layer. Say, if we have a circle pattern on the upper half of the image, and a straight line on the right side of the lower half of the image, the likelihood of the 9 digit is high. So, the neuron of the output layer will be activated.
You see, how elegant this is. A big complex problem is broken down into a layer of abstractions, and a specific task is assigned at each layer to achieve the final objective. I hope this helped in understanding the hidden mysteries of the neural network.
Moving on, for our chat-bot, we have to do fundamentally the same thing. The input layer is the bag of words of the patterns, the number of neurons will be the size of our vocabulary list. The output layer will the bag of words of the intents corresponding to those patterns, the number of neurons will be the size of our intent list.
We will also have 2 hidden layers with 8 neurons each, which will themselves figure out how to reach from the input neuron values to the output neuron value. That’s where the brilliance of machine learning comes in.
But in the end, it’s all math repeated on monumental proportions. If you are interested in the mathematical details of the neural network, I’ll recommend this series by a math professor.
When the model is trained on our training data, it will spit out the output in terms of probabilities. How probable every intent is, based on the question pattern asked by the user. We simply pick the intent with the highest probability.
Step 3: Chatting with the user
I must admit, this is the easiest of the steps. Until now, our model is trained with the training data we provided, and it is now ready to predict intent behind the sentences thrown by the unknown customer.
For simplicity’s sake, we will interact with the user on a command line interface. We’ll ask the user to chat with the bot by posting a query. Naturally, the user will type the query in the natural language. Remember, the neural network only understands numbers. So, we need to convert this natural language query into something that the network understands.
We need to stem the words from this query, one hot encode them against our vocabulary size, convert them into a bag of words for a new unknown pattern. Finally, we feed it into the network, and it will spit out the probabilities of every intent. We choose the maximum one.
Then, in that intent section, we choose a random response and throw it back to the user. Sweet.
To fine-tune it further, we will say “We didn’t get you” if the user posts some random message which does not match the list of intents we have. This can be done easily by checking the probability of our maximum intent output neuron. If it is more than (say 0.7 i.e. more than 70% probability), then only show a real response, otherwise, ask the user to ‘try again’.
My lords and ladies, there you have it. Your own working FAQ chat-bot which you can customize to handle any kind of queries and answers. Go figure.