Which is the best NLP?

Original article was published by Rahul Bhattacharya on Artificial Intelligence on Medium

Which is the best NLP library?

Photo by Ella Deane on Unsplash

Ever since I started working on NLP(Natural Language Processing), I have been wondering which one is the best NLP library that can meet most of our common NLP requirements. Although it is true that there is no one-size fits all, and the choice of library would depend on the task at hand, I was still curious as to how different libraries would compare if they were to be bench-marked against a very simple task.

With that in mind, I put on my developer hat and set out writing python code using various libraries, to evaluate them against a very common task. To keep things simple, I decided to use the Twitter text-classification problem for the evaluation. The most common NLP libraries today are NLTK, Spacy, WordBlob, Gensim, and of-course Deep Neural Network architectures using LSTM(Long Short Term Memory) or GRU(Gated Recurrent Unit)cells.

The problem statement

The dataset I am using consists of a collection of Twitter tweets. Some of the tweets are labeled as racist while others are not. This is a classical supervised learning based binary-classification problem. Our job is to create models based on different libraries, and use them to classify previously unseen text as racist or not.

Here is a look at some of the available tweets:

The label 1 means the tweet is racist and label 0 means its not.

For the sake of brevity, I will only be focusing on the key sections of the code. For the full code, please feel free to visit my Github repository. Since I have already cleaned up the dataset and performed the EDA(Exploratory Data Analysis), I will not be covering those details here either.

I will be focusing on five different libraries here — NLTK, WordBlob, Spacy with CNN, Spacy with document vectors, and finally a deep neural network model with Bidirectional LSTM cells.

I will be using F1-Score as my evaluation metric.

So let’s get started.


Let’s start with the NLTK package. Here we have used the NLTK library for tokenization and lemmatization of the tweets, and Gensim’s Word2Vec model for creating word vectors. Finally the XGBoost Classifier model is used for training and inference. Below is a snippet containing the relevant sections of the code

Using this technique, I was able to obtain an F1-Score of 0.68


TextBlob is a beginner friendly NLP library and provides a lot of cool features. It also provides built-in text classifiers. However I found TextBlob a lot slower than the other libraries I have used.

For our purpose, we will use the built-in DecisionTreeClassifier of TextBlob. Below is the code snippet:

Using this technique, I was able to get an F1-Score of 0.46

Spacy with CNN

Spacy is one of the most popular and widely used libraries for NLP, and offers very powerful features. It offers a build in text classifier named textcat which is available as a default pipe-line component. The textcat component supports BOW(bag of words), Simple-CNN(Convolution Neural Network) and Ensemble architectures.

In our example, we will use the simple-cnn architecture.

For this scenario I get an F1-Score of 0.66

Spacy with document vectors

In this example, we will continue use Spacy, but instead of using it’s built-in text-classifier, I would use Spacy to generate document vectors and then feed those vectors to an XGBoost classifier. Lets see if that improves our score.

Here we get an F1-Score of 0.64

Deep Neural Network with LSTM cells

Finally we will be creating a neural network model using bi-directional LSTM cells. We will be using TensorFlow’s Keras library and use its features like tokenization and padding of sequences

Here we get an F1-Score of 0.41