What is Natural Language Processing(NLP)?

Original article was published by Manik Soni on Artificial Intelligence on Medium

What is Natural Language Processing(NLP)?

What is Natural Language Processing(NLP)? Uses of Natural Language Processing(NLP)? What is the NLP model (Bag of Words)? How to implement Natural Language Processing(NLP)?

Natural Language Processing

What is Natural Language Processing(NLP)?

Natural Language Processing(NLP) is an area of computer science and artificial intelligence that concerned with the interaction between human and computer languages. It an unsupervised machine learning approach.


Teach machines to understand what is said in the spoken and written word is the focus of Natural Language Processing(NLP). Whenever you dictate something into your iPhone/Android device that is then converted to text, that’s an NLP algorithm in action. Just like Apple’s Siri and Amazon’s Alexa are all based upon the NLP algorithm.

Uses of Natural Language Processing(NLP)?

  1. Sentiment Analysis: Identifying the mood or subjective opinions within large amounts of text, including average sentiment and opinion mining.
  2. Use to predict the genre of the book.
  3. Question Answering
  4. Use NLP to build a machine translator or a speech recognition system.
  5. Document Summarization.

Main Natural Language Processing(NLP) Library examples:

  1. Natural Language Toolkit(NLTK): It helps to break down the sentence into a tree format.

2. SpaCy

3.Stanford NLP

4. OpenNLP

You can see how NLP works in the background to create relations if we do the visualization part.

What is the NLP model (Bag of Words)?

It is a model used to preprocess the texts to classify before fitting the classification algorithms on the observations containing the texts.

Bag of Words

NLP models involve two things:

  1. A vocabulary of known words.
  2. A measure of the presence of known words.

How to implement Natural Language Processing(NLP)?

So, in order to implement Natural Language Processing(NLP), we are taking one of its applications that is, Sentimental Analysis.

Dataset used: We are taking Amazon’s Review dataset which you can find on any website like Kaggle. So data of customers who are having there opinion upon a particular product and we are checking the polarity index.

Step 1. Import the Libraries.

Step 2. Importing the dataset.

Step 3. Data Preprocessing: Add another column named ‘Positive’, which helps to do the categorization according to review rating (if >3 then positive otherwise negative ).

Step 4. Extracting the meaningful words and removing the punctuations, conjunctions, etc.

Step 5. Tokenizing the words and then use countVectorizer for making a sparse matrix. In order to represent the input dataset as Bag of words, we will use CountVectorizer and call its transform method.

Step 6. Splitting the data into the Train and Test set.

Step 7. Now, we can fit a machine learning model into any categorization algorithm(example KNN , SVM, RandomForestClassifier , etc) .

Using KNN
Using SVM
Using Random Forest Classifier

Step 8. Creating a confusion matrix.

Step 9. Analyzing the result individually.

KNN Result
Random Forest Classifier Result
SVM Result