Sentiment Analysis with most promising tools in the market: And the winner is ?

Original article was published by Vyom Aggarwal on Artificial Intelligence on Medium


I want to demonstrate how existing NLP tools and custom build text classifier models (I used EazyML platform) to help determine the sentiment of a text w whether it’s positive, negative, or neutral. For this purpose, I first investigate the primary studies and calculate the accuracy on the baseline dataset. Finally, I examine the state-of-art sentiment analysis library and calculate the accuracy using it.

NLTK

The simplest way to begin extracting sentiment label from text is NLTK Vader. Vader is a lexicon and rule-based sentiment analysis tool specifically calibrated to sentiments most commonly expressed on social media platforms. When calculating a polarity score Vader outputs four metrics: Compound, negative, positive, and neutral . The Compound score calculates the sum of all lexicon ratings which is normalized between -1 (most negative)and +1 (most positive). Negative, positive, and neutral represent the proportion of the text that falls into these categories.

Tip: Use Python3 for a smooth experience.

import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzerdef nltk_sentiment_analysis(text):
sentiment = SentimentIntensityAnalyzer()
total_sentiment = sentiment.polarity_scores(text)
return total_sentiment
print(nltk_sentiment_analysis('You are awesome.'))##OUTPUT = ({'neg': 0.0, 'neu': 0.328, 'pos': 0.672, 'compound': 0.6249})
#The compound value of .62 represents the sentence has positive sentiment.

Now, Let’s calculate the accuracy of our generalized dataset, which can be download from here.

import xlrd
data_file_object = xlrd.open_workbook('sentiment_analysis.xlsx')
sheet = data_file_object.sheet_by_index(0)
nrows = sheet.nrows
correct_count = 0
for i in range(1,int(nrows)): text = sheet.row_values(i)[0].strip()
input_sentiment = sheet.row_values(i)[1].strip()
output_sentiment = nltk_sentiment_analysis(text)
if output_sentiment['compound'] > 0.2:
result_sentiment = 'Positive'
elif output_sentiment['compound'] < -0.2 :
result_sentiment = 'Negative'
else:
result_sentiment = 'Neutral'
if input_sentiment in result_sentiment:
correct_count = correct_count + 1
print('Accuracy:', str((correct_count/nrows)*100),'%' ) ###OUTPUT: Accuracy: 50.7%

TextBlob

Another technique that provides text-processing operations in a straight forward fashion is called TextBlob. The following method differs from Vader by returning a tuple with a polarity and subjectivity score. Subjectivity is a float value within the range [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. Subjective sentence expresses some personal feelings, views, beliefs, opinions, allegations, desires, beliefs, suspicions, and speculations whereas Objective sentences are factual. Polarity is a float value within the range [-1.0 to 1.0] where 0 indicates neutral, +1 indicates a very positive sentiment and -1 represents a very negative sentiment.

from textblob import TextBlobdef textblob_sentiment_analysis(text):    text_blob = TextBlob(text)
total_sentiment = text_blob.sentiment
return total_sentiment
print( textblob_sentiment_analysis('You are awesome.') )
##OUTPUT = (Sentiment(polarity=1.0, subjectivity=1.0))
#The output shows the text is very positive and very Subjective.

Accuracy for TextBlob, (dataset)


##SAME AS ABOVE##
for i in range(1,int(nrows)): text = sheet.row_values(i)[0].strip()
input_sentiment = sheet.row_values(i)[1].strip()
output_sentiment = textblob_sentiment_analysis(text)
if output_sentiment.polarity > 0.2:
result_sentiment = 'Positive'
elif output_sentiment.polarity < -0.2 :
result_sentiment = 'Negative'
else:
result_sentiment = 'Neutral'
if input_sentiment in result_sentiment:
correct_count = correct_count + 1
print('Accuracy:', str(correct_count*100/nrows),'%' )###OUTPUT: Accuracy: 19%

Flair

Flair allows you to apply natural language processing (NLP) models to sections of text. It works quite differently from the previously mentioned models. Flair utilizes a pre-trained model to detect positive or negative comments and print a number in brackets behind the label which is prediction confidence.

Tip: For installing flair you’ll need python3. In case, you get issues in importing flair after installation, go to its GitHub repo and pull the bug-fixes in your flair library.

import flairflair_sentiment = flair.models.TextClassifier.load('en-sentiment')def flair_sentiment_analysis(text):    s = flair.data.Sentence(text)
flair_sentiment.predict(s)
total_sentiment = s.labels
return str(total_sentiment[0]).split()
print ( flair_sentiment_analysis('You are awesome.') )##OUTPUT = ('POSITIVE', '0.9868430495262146')

Flair’s accuracy, (dataset)

##SAME AS ABOVE##for i in range(1,int(nrows)):        text = sheet.row_values(i)[0].strip()
input_sentiment = sheet.row_values(i)[1].strip()
output_sentiment = flair_sentiment_analysis(text)
if input_sentiment.lower() in output_sentiment[0].lower():
correct_count = correct_count + 1
print("Accuracy:", str((correct_count/nrows)*100),'%' )###OUTPUT: Accuracy: 45.07%

EazyML

I was really impressed by EazyML and truly believe it delivers state-of-the-art performance for Sentiment analysis/text Classification. Why is EazyML so important for NLP? It’s a unique ML platform with GUI & API interface and which facilitates numerous functionalities like Concept Extraction, sentiment analysis, and a powerful automation process for applying ML to real-world problems. To try EazyML click here.

Let me show you how simple it is to get started with EazyML!

First, get your auth_token from (here)!.

import json
import requests
auth_token = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJqdGkiOiIzYTE5M2ZjMS03MWU3LTQ4YzMtYTViNC1hM2Q2NmRlOWUyNzIiLCJleHAiOjE1ODkxODk3MzIsImZyZXNoIjpmYWxzZSwiaWF0IjoxNTg5MTAzMzMyLCJ0eXBlIjoiYWNjZXNzIiwibmJmIjoxNTg5MTAzMzMyLCJpZGVudGl0eSI6ImFwcF9kZW1vIn0.mxzl7PqVPdLD7bUGsb04g-PoGP7iOyL3_ROTtkcTCws'def EazyML_sentiment_analysis(text, options, auth_token): APP_REQUEST_URL = "https://development.eazyml.com/ez_app/ez_sentiments"

payload = { "text": text, "options": options }

headers = { "Content-Type": "application/json",
"Authorization": "Bearer " + str(auth_token),
}
response = requests.request( "POST", APP_REQUEST_URL, headers = headers, data = json.dumps(payload))
try:
response_json = response.json()
return response_json["dataframe"]["data"][0]
except Exception as e:
print (e)
return None
print ( EazyML_sentiment_analysis('You are awesome.','',auth_token))##OUTPUT = ['You are awesome.', 0.9107961537401248]

Check outhttps://eazyml.com/appdocs for in-depth knowledge.

EazyML’s accuracy proves it’s state-of-art status, (dataset)

##SAME AS ABOVE##auth_token = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJqdGkiOiIzYTE5M2ZjMS03MWU3LTQ4YzMtYTViNC1hM2Q2NmRlOWUyNzIiLCJleHAiOjE1ODkxODk3MzIsImZyZXNoIjpmYWxzZSwiaWF0IjoxNTg5MTAzMzMyLCJ0eXBlIjoiYWNjZXNzIiwibmJmIjoxNTg5MTAzMzMyLCJpZGVudGl0eSI6ImFwcF9kZW1vIn0.mxzl7PqVPdLD7bUGsb04g-PoGP7iOyL3_ROTtkcTCws'for i in range(1,int(nrows)):
try:
text = sheet.row_values(i)[0].strip()
input_sentiment = sheet.row_values(i)[1].strip()
output_sentiment = EazyML_sentiment_analysis(text,'',auth_token)
if output_sentiment[1] > 0.65:
result_sentiment = 'Positive'
elif output_sentiment[1] < 0.35:
result_sentiment = 'Negative'
else:
result_sentiment = 'Neutral'
if str(input_sentiment) == str(result_sentiment):
correct_count = correct_count + 1
except Exception as e:
print (e)
print("Accuracy:", str(correct_count*100/nrows),'%' )###OUTPUT: Accuracy: 66%

Conclusion

After my extension analysis, I would say EazyML has out-performed all other sentiment analysis tools. I would encourage you to try and run the code for yourself. I have mentioned the links of all useful resources in the reference. I will keep posting more articles as my NLP journey continues. So stay tuned and until next time, happy coding!!

Reference

https://textblob.readthedocs.io/en/dev/