Building a Bi-LSTM based classification model

Original article was published on Artificial Intelligence on Medium

Building a Bi-LSTM based classification model

Our data here consists of a given set of utterances (given set of questions), reply(output for a given utterance), and intent(the class to which the utterance belongs).

We’re trying to build a model that rightly classifies a given utterance into one of those mentioned intents.

Below mentioned is a methodology to implement a multi-class text classification approach using a BI-LSTM model. Where the words are vectorized using the GloVe embeddings and fed as a sequence into the LSTM.

Bidirectional Long Short-Term Memory (BI-LSTM) with Attention Mechanism

Building a Bi-LSTM based classification model

For computers to understand text, text needs to be vectorized. Vectorization is the process of converting text to numbers for a machine to process it. There are different approaches to vectorization. The following are the approaches that are usually used for vectorization:

  • Bag of Words
  • TF-IDF method
  • Word Embeddings

Each of the above-mentioned approaches follows a different approach to vectorization and today, we will be discussing the Word embeddings approach.

Download the pre-trained WordVectors from here.

For this model, we’ve used the Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zip for our classification but feel free to use any of those. You’ll find a snapshot of the data we used below:

In this picture, the utterance column represents the sentence and the intent represents the class which the utterance belongs to.

One hot encoding of the different classes in the data is shown below:

We then convert the Glove vectors into a dictionary, which looks like this:

We then use these word-embeddings to vectorize the sentences we have in our data. These vectorized sentences are then later fed into a Bi-LSTM for classification.

Train your the created model using the code below:

Evaluate your model on test-data with the following code:

Tune your hyper-parameters for better accuracy. Below table is a comparison of different approaches used for classification and the reason why we chose Bi-LSTM, Word Embeddings:

Conclusion

The Bag of words or TF-IDF approach does not take into account the structure of the sentence whereas in RNN based approaches like the one presented above the sentence is directly passed into the model.

Additionally, the TF-IDF approach with fine-tuned Random Forest and SVC models had validation accuracies at ~86% whereas the BI-LSTM approach led to an accuracy of ~96%.

How to reach FIO Labs:

Leave a comment below | Book a FREE 30-min session for our on-going Pro Bono Services or Fill in our LinkedIn Form| Contact Us | About FIO Labs | Blog