Source: Deep Learning on Medium
Sentiment Analysis is a classic example of machine learning, which (in a nutshell) refers to:
“A way of ‘learning’ that enables algorithms to evolve.”
This ‘learning’ means feeding the algorithm with a massive amount of data so that it can adjust itself and continually improve.”
Sentiment analysis is the automated process of understanding an opinion about a given subject from written or spoken language.
In a world where we generate 2.5 quintillion bytes of data every day, sentiment analysis has become a key tool for making sense of that data. This has allowed companies to get key insights and automate all kind of processes.
The Basics of Sentiment Analysis
Sentiment Analysis also known as Opinion Mining is a field within Natural Language Processing (NLP) that builds systems that try to identify and extract opinions within text. Usually, besides identifying the opinion, these systems extract attributes of the expression e.g.:
- Polarity: if the speaker express a positive or negative opinion,
- Subject: the thing that is being talked about,
- Opinion holder: the person, or entity that expresses the opinion.
Currently, sentiment analysis is a topic of great interest and development since it has many practical applications. Since publicly and privately available information over Internet is constantly growing, a large number of texts expressing opinions are available in review sites, forums, blogs, and social media.
With the help of sentiment analysis systems, this unstructured information could be automatically transformed into structured data of public opinions about products, services, brands, politics, or any topic that people can express opinions about. This data can be very useful for commercial applications like marketing analysis, public relations, product reviews, net promoter scoring, product feedback, and customer service.
What Is an Opinion?
Text information can be broadly categorized into two main types: facts and opinions. Facts are objective expressions about something. Opinions are usually subjective expressions that describe people’s sentiments, appraisals, and feelings toward a subject or topic.
Sentiment analysis, just as many other NLP problems, can be modeled as a classification problem where two sub-problems must be resolved:
- Classifying a sentence as subjective or objective, known as subjectivity classification.
- Classifying a sentence as expressing a positive, negative or neutral opinion, known as polarity classification.
In an opinion, the entity the text talks about can be an object, its components, it’s aspects, its attributes, or its features. It could also be a product, a service, an individual, an organization, an event, or a topic. As an example, take a look at the opinion below:
“The battery life of this camera is too short.”
A negative opinion is expressed about a feature (battery life) of an entity (camera).
Direct vs Comparative Opinions
There are two kinds of opinions: direct and comparative. Direct opinions give an opinion about a entity directly, for example:
“The picture quality of camera A is poor.”
This direct opinion states a negative opinion about camera A.
In comparative opinions, the opinion is expressed by comparing an entity with another, for example:
“The picture quality of camera A is better than that of camera B.”
Usually, comparative opinions express similarities or differences between two or more entities using a comparative or superlative form of an adjective or adverb. In the previous example, there’s a positive opinion about camera A and, conversely, a negative opinion about camera B.
Explicit vs Implicit Opinions
An explicit opinion on a subject is an opinion explicitly expressed in a subjective sentence. The following sentence expresses an explicit positive opinion:
“The voice quality of this phone is amazing.”
An implicit opinion on a subject is an opinion implied in an objective sentence. The following sentence expresses an implicit negative opinion:
“The earphone broke in two days.”
Within implicit opinions we could include metaphors that may be the most difficult type of opinions to analyze as they include a lot of semantic information.
Sentiment Analysis Scope
Sentiment analysis can be applied at different levels of scope:
- Document level sentiment analysis obtains the sentiment of a complete document or paragraph.
- Sentence level sentiment analysis obtains the sentiment of a single sentence.
- Sub-sentence level sentiment analysis obtains the sentiment of sub-expressions within a sentence.
Why sentiment analysis is important?
It’s estimated that 80% of the world’s data is unstructured and not organized in a pre-defined manner. Most of this comes from text data, like emails, support tickets, chats, social media, surveys, articles, and documents. These texts are usually difficult, time-consuming and expensive to analyze, understand, and sort through.
Sentiment analysis systems allows companies to make sense of this sea of unstructured text by automating business processes, getting actionable insights, and saving hours of manual data processing, in other words, by making teams more efficient.
Some of the advantages of sentiment analysis include the following:
Can you imagine manually sorting through thousands of tweets, customer support conversations, or customer reviews? There’s just too much data to process manually. Sentiment analysis allows to process data at scale in an efficient and cost-effective way.
- Real-time analysis:
We can use sentiment analysis to identify critical information that allows situational awareness during specific scenarios in real-time. Is there a PR crisis in social media about to burst? An angry customer that is about to churn? A sentiment analysis system can help you immediately identify these kinds of situations and take action.
- Consistent criteria:
Humans don’t observe clear criteria for evaluating the sentiment of a piece of text. It’s estimated that different people only agree around 60–65% of the times when judging the sentiment for a particular piece of text. It’s a subjective task which is heavily influenced by personal experiences, thoughts, and beliefs. By using a centralized sentiment analysis system, companies can apply the same criteria to all of their data. This helps to reduce errors and improve data consistency.
How Does Sentiment Analysis Work?
Sentiment Analysis Algorithms
There are many methods and algorithms to implement sentiment analysis systems, which can be classified as:
- Rule-based systems that perform sentiment analysis based on a set of manually crafted rules.
- Automatic systems that rely on machine learning techniques to learn from data.
- Hybrid systems that combine both rule based and automatic approaches.
Usually, rule-based approaches define a set of rules in some kind of scripting language that identify subjectivity, polarity, or the subject of an opinion.
The rules may use a variety of inputs, such as the following:
- Classic NLP techniques like stemming, tokenization, part of speech tagging and parsing.
- Other resources, such as lexicons (i.e. lists of words and expressions).
A basic example of a rule-based implementation would be the following:
- Define two lists of polarized words (e.g. negative words such as bad, worst, ugly, etc and positive words such as good, best, beautiful, etc).
- Given a text:
- Count the number of positive words that appear in the text.
- Count the number of negative words that appear in the text.
- If the number of positive word appearances is greater than the number of negative word appearances return a positive sentiment, conversely, return a negative sentiment. Otherwise, return neutral.
Automatic methods, contrary to rule-based systems, don’t rely on manually crafted rules, but on machine learning techniques. The sentiment analysis task is usually modeled as a classification problem where a classifier is fed with a text and returns the corresponding category, e.g. positive, negative, or neutral (in case polarity analysis is being performed).
Said machine learning classifier can usually be implemented with the following steps and components:
The Training and Prediction Processes
In the training process (a), our model learns to associate a particular input (i.e. a text) to the corresponding output (tag) based on the test samples used for training. The feature extractor transfers the text input into a feature vector. Pairs of feature vectors and tags (e.g. positive, negative, or neutral) are fed into the machine learning algorithm to generate a model.
In the prediction process (b), the feature extractor is used to transform unseen text inputs into feature vectors. These feature vectors are then fed into the model, which generates predicted tags (again, positive, negative, or neutral).
Feature Extraction from Text
The first step in a machine learning text classifier is to transform the text into a numerical representation, usually a vector. Usually, each component of the vector represents the frequency of a word or expression in a predefined dictionary (e.g. a lexicon of polarized words). This process is known as feature extraction or text vectorization and the classical approach has been bag-of-words or bag-of-ngrams with their frequency.
The classification step usually involves a statistical model like Naïve Bayes, Logistic Regression, Support Vector Machines, or Neural Networks:
- Naïve Bayes: a family of probabilistic algorithms that uses Bayes’s Theorem to predict the category of a text.
- Linear Regression: a very well-known algorithm in statistics used to predict some value (Y) given a set of features (X).
- Support Vector Machines: a non-probabilistic model which uses a representation of text examples as points in a multidimensional space. These examples are mapped so that the examples of the different categories (sentiments) belong to distinct regions of that space.. Then, new texts are mapped onto that same space and predicted to belong to a category based on which region they fall into.
- Deep Learning: a diverse set of algorithms that attempts to imitate how the human brain works by employing artificial neural networks to process data.
Sentiment Analysis Metrics and Evaluation
There are many ways in which you can obtain performance metrics for evaluating a classifier and to understand how accurate a sentiment analysis model is. One of the most frequently used is known as cross-validation.
What cross-validation does is splitting the training data into a certain number of training folds (with 75% of the training data) and a the same number of testing folds (with 25% of the training data), use the training folds to train the classifier, and test it against the testing folds to obtain performance metrics (see below). The process is repeated multiple times and an average for each of the metrics is calculated.
If your testing set is always the same, you might be overfitting to that testing set, which means you might be adjusting your analysis to a given set of data so much that you might fail to analyze a different set. Cross-validation helps prevent that. The more data you have, the more folds you will be able to use.
Precision, Recall, and Accuracy
Precision, recall, and accuracy are standard metrics used to evaluate the performance of a classifier.
Precision measures how many texts were predicted correctly as belonging to a given category out of all of the texts that were predicted (correctly and incorrectly) as belonging to the category.
Recall measures how many texts were predicted correctly as belonging to a given category out of all the texts that should have been predicted as belonging to the category. We also know that the more data we feed our classifiers with, the better recall will be.
Accuracy measures how many texts were predicted correctly (both as belonging to a category and not belonging to the category) out of all of the texts in the corpus.
Most frequently, precision and recall are used to measure performance since accuracy alone does not say much about how good or bad a classifier is.
For a difficult task like analyzing sentiment, precision and recall levels are likely to be low at first. As you feed the classifier with more data, performance will improve. However, as we will see below, since annotated data is not likely to be accurate, the chances are that precision levels won’t get too high. However, if you feed the classifier consistently tagged data, results are going to be as good as results can be for any other classification problem.
When it comes to inter-annotator agreement (i.e. agreement by humans on a given annotation task), one of the most frequently used metrics is Krippendorff’s Alpha. According to Saif et al., best inter-annotator agreement for Twitter sentiment analysis reaches a 0.655 value of Krippendorff’s Alpha. This means there is a good deal of agreement (since alpha is greater than zero), but we believe it’s still far from great (e.g.: around 0.8, which is the minimum reliability threshold social scientists use in order to say data is reliable, see here). This said, only tentative conclusions about the sentiment of tweets can be drawn from the results of the annotation tasks described in the paper cited above.
All in all, this 0.655 is an indicator of the difficulty of sentiment analysis detection for humans as well. Taking into consideration that machines learn from the data they are fed with, automatic predictions are likely to mirror the human disagreement embedded in the data.
The concept of hybrid methods is very intuitive: just combine the best of both worlds, the rule-based and the automatic ones. Usually, by combining both approaches, the methods can improve accuracy and precision.
Neural Networks, commonly known as Artificial Neural Networks (ANN) are a family of Machine Learning techniques modeled on the human brain. Being able to extract hidden patterns within data is a key ability for any Data Scientist and Neural Network approaches may be especially useful for extracting patterns from images, video or speech.
The structure of a neural network:
The network consists of different components:
- Input layer: this reflects the potential descriptive factors that may help in prediction.
- Hidden layer: a user-defined number of layers with a specified number of neurons in each layer.
- Output layer: this reflects the thing you are trying to predict. For example; this could be a labelling of an image or a more traditional 0/1 outcome
- Weights: each neuron in a given layer is potentially connected to every neuron in adjacent layers — the weight sets the importance of this link. At first these weights should be randomized.
Building blocks and functionality of ANNs
This is the building unit of the neural networks, which imitates the functionality of a human neuron. Typical neural networks uses the sigmoid function which is demonstrated below. This function is used mostly due to its nature of being able to write the derivative in terms of f(x) itself, which comes handy when minimizing error.
z = ∑ w×x
y = sigmoid(z)
w = weights
x = inputs
Demo application using ANN
Let us review an example that will train using two images and apply a filter onto a given image. The following are the source and the target image for the training process.
We have used an ANN that uses back propagation in order to adjust errors. The intension of the training is to find a function f(red, green, blue, alpha) to match the target color transformation. The target image is made using several color adjustments of the source image.
Convolutional Neural Networks
Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. CNNs have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars.
In the above figure, a CNNs is able to recognize scenes and the system is able to suggest relevant captions (“a soccer player is kicking a soccer ball”) while below figure shows an example of CNNs being used for recognizing everyday objects, humans and animals. Lately, CNNs have been effective in several Natural Language Processing tasks (such as sentence classification) as well.
Sentiment Analysis Use Cases & Applications
Sentiment Analysis in Social Media Monitoring
On the fateful evening of April 9th, 2017, United Airlines forcibly removed a passenger from an overbooked flight. The nightmare-ish incident was filmed by other passengers on their smartphones and posted immediately. One such video, posted to Facebook, was shared more than 87,000 times and viewed 6.8 million times by 6pm on Monday, just 24 hours later.
The fiasco was magnified horrifically by the company’s dismissive response. On Monday afternoon, they tweeted a statement from the CEO apologizing for “having to re-accommodate customers.” Cue public outrage –you can imagine the field day on Twitter.
This is exactly the kind of PR catastrophe we’d all like to do happily without. This is also an excellent example of why we care not only about if people are talking about our brand, but how they’re talking about it. More mentions does not equal positive mentions.
In today’s day and age, brands of all shapes and sizes have meaningful interactions with customers, leads, and even competition on social networks like Facebook, Twitter, and Instagram. Most marketing departments are already tuned into to online mentions as far as volume –they measure more chatter as more brand awareness. Nowadays, however, we can take a step deeper. By using sentiment analysis on social media, we can get incredible insights into the quality of conversation that’s happening around a brand.
How Sentiment Analysis Can Be Used
- Analyze tweets and/or facebook posts over a period of time to see sentiment a particular audience.
- Run sentiment analysis on all social media mentions to your brand and automatically categorize by urgency.
- Automatically route social media mentions to team members best fit to respond.
- Automate any or all of these processes.
- Use analytics to gain deep insight into what’s happening across your social media channels.
Sentiment analysis is useful in social media monitoring because it helps you do all of the following:
- Prioritize action. Which is more urgent: a fuming customer or a subtle “thanks!” shout-out? Obviously the fumer. Sentiment analysis lets you easily filter unread mentions by positivity and negativity, showing you which blazing fires to put on the “extinguish immediately” list and which slow smolders can wait a bit.
- Track trends over time.
- Tune into a specific point in time –i.e. the lead-up to a new product launch or the day a particular piece of bad press dropped.
- Keep a finger on the competition. Why not monitor your competitors’ social media the same way you monitor your own? If you tune in closely, maybe you notice there’s been a negative response to a particular feature of their new product, and you respond by designing a lead generation campaign targeting exactly that gap. They won’t even know what hit them.
Over the course of a few months during the 2016 US Presidential Elections, we collected and analyzed millions of tweets mentioning Clinton or Trump posted by users from around the world. We classified each of those tweets with a sentiment of either positive, neutral, or negative.
- Negative: “Racial discord was conceived, nurtured, refined & perpetuated by Americans incl @realDonaldTrump’s father. Get real!”
- Neutral: “@HillaryClinton will receive the first question at tonight’s presidential debate, according to @CBSNews #ClintonVsTrump”.
- Positive: “Americans trust @realDonaldTrump to Make our Economy Great Again!”
- Positive: “@wcve it’s amazing how our city loves him and he really loves our city. @HillaryClinton made a great choice for Vice President. @timkaine”.
From this simple, easy analysis, we found interesting insights:
- More tweets mentioned @realDonaldTrump (~450k/day) than @HillaryClinton (~250k/day). Again, this does not equal positivity, but does imply brand awareness (and in the case of something like elections, awareness is key).
- For both candidates, there were more negative than positive tweets. Given that it’s Twitter and politics, this was not much of a surprise.
- Trump had a better positive to negative Tweet ratio than Clinton.
To sum up, more people were tweeting about Trump, and a higher percentage of the people tweeting about Trump were doing so more positively than were the people tweeting about Clinton.
Sentiment Analysis in Brand Monitoring
Not only do brands have a wealth of information available on social media, but they also can look more broadly across the internet to see how people are talking about them online. Instead of focusing on specific social media platforms such as Facebook and Twitter, we can target mentions in places like news, blogs, and forums –again, looking at not just the volume of mentions, but also the quality of those mentions.
In our United Airlines example, for instance, the flare-up started on the social media platforms of a few passengers. Within hours, it was picked up by news sites and spread like wildfire across the US. News then spread to China and Vietnam, as the passenger was reported to be an American of Chinese-Vietnamese descent and people accused the perpetrators of racial profiling. In China, the incident became the number one trending topic on Weibo, a microblogging site with almost 500 million users.
And again, this is all happening within mere hours and days of when the incident took place.
How Sentiment Analysis Can Be Used
- Analyze news articles, blog posts, forum discussions, and other texts on the internet over a period of time to see sentiment of a particular audience.
- Automatically categorize urgency of all online mentions to your brand via sentiment analysis.
- Automatically alert designated team members of online mentions that concern their area of work.
- Automate any or all of these processes.
- Better understand a brand online presence by getting all kinds of interesting insights and analytics.
Sentiment analysis is useful in brand monitoring because it helps you do all of this:
- Understand how your brand reputation evolves over time.
- Research your competition and understand how their reputation also evolves over time.
- Identify potential PR crises and know to take immediate action. Again, prioritize what fires need to be put out immediately and what mentions can wait.
- Tune into a specific point in time. Again, maybe you want to look at just press mentions on the day of your IPO filing, or a new product launch. Sentiment analysis lets you do that.
Example: Expedia Canada
Around Christmastime, Expedia Canada ran a classic “escape winter” marketing campaign. All was well, except for their choice of screeching violin as background music. Understandably, people took to social media, blogs, and forums. Expedia noticed that and removed the ad. Then, they created a series of follow-up spin-off videos: one showed the original actor smashing the violin, and in another one, they invited a real follower who had complained on Twitter to come in and rip the violin away. Though their original product was far from flawless, they were able to redeem themselves by incorporating real customer feedback into continued iterations.
Using sentiment analysis (and machine learning), you can automatically monitor all chatter around your brand and detect this type of potentially-explosive scenario while you still have time to defuse it.
Sentiment Analysis in Customer Feedback
Social media and brand monitoring offer us immediate, unfiltered, invaluable information on customer sentiment. In a parallel vein run two other troves of insight –surveys and customer support interactions. Teams often look at their Net Promoter Score (NPS), but we can also apply this analyses to any type of survey or communication channel that yields textual customer feedback.
NPS surveys ask a few simple questions — namely, Would you recommend this company, product, and/or service to a friend or family member? and why? –and use that to identify customers as promoters, passives, or detractors. The goal is to identify overall customer experience, and find ways to elevate all customers to “promoter” level, where they theoretically will buy more, stay longer, and refer other customers.
Numerical survey data is easily aggregated and assessed, but we want that same ease with the “why” answers as well. A regular NPS score simply gives you a number, without the additional context of what it’s about and why the score landed there. Sentiment analysis takes it that step further.
How Sentiment Analysis Can Be Used:
- Analyze aggregated NPS or other survey responses.
- Analyze aggregated customer support interactions.
- Track customer sentiment about specific aspects of the business over time. This adds depth to explain why the overall NPS score might have changed, or if specific aspects have shifted independently.
- Target individuals to improve their service. By automating sentiment analysis on incoming surveys, you can be alerted to customers who feel strongly negatively towards your product or service, and can deal with them specifically.
- Determine if particular customer segments feel more strongly about your company. You can zero in on sentiment by certain demographics, interests, personas, etc.
Sentiment analysis is useful in understanding Voice of Customer (VoC) because it helps you do all of the following:
- Use results of sentiment analysis to design better informed questions to ask on future surveys.
- Understand the nuances of customer experience over time, along with why and how shifts are happening.
- Empower your internal teams by giving them a deeper view of the customer experience, by segment and by specific aspects of the business.
- Respond more quickly to signals and shifts from customers.
Example: McKinsey City Voices project
In Brazil, federal public spending rose by 156% from 2007 to 2015 while people’s satisfaction with public services steadily decreased. Unhappy with this counterproductive progress, the Urban-planning Department recruited McKinsey to help them work on a series of new projects that would focus first on user experience, or citizen journeys, when delivering services. This citizen-centric style of governance has led to the rise of what we call Smart Cities.
McKinsey developed a tool called City Voices, which conducts citizen (customer) surveys across more than 150 different metrics, and then runs sentiment analysis to help leaders understand how constituents live and what they need, in order to better inform public policy. By using this tool, the Brazilian government was able to surface urgent needs –a safer bus system, for instance– and improve them first.
If even whole cities and countries, famous for their red tape and slow pace, are incorporating customer journeys and sentiment analysis into their decision making processes, then innovative companies better be far ahead.
Sentiment Analysis in Customer Support
We all know the drill: stellar customer experiences = more probable returning customers. Particularly in recent years, there’s been a lot of talk (rightfully so) around customer experience and customer journeys. Leading companies have begun to realize that oftentimes how they deliver is just as (if not more) important as what they deliver. Nowadays, more than ever, customers expect their experience with companies to be immediate, intuitive, personal, and hassle-free. In fact, research shows that 25% of customers will switch to a competitor after just one negative interaction.
We already looked at how we can use sentiment analysis in looking at the broader VoC, but now we’ll dial in on specifically customer service teams.
How Sentiment Analysis Can Be Used:
- Automate systems to run sentiment analysis on all incoming customer support queries.
- Rapidly detect disgruntled customers and surface those tickets to the top.
- Route queries to specific team members best suited to respond.
- Use analytics to gain deep insight into what’s happening across your customer support.
Sentiment analysis is useful in customer support because it helps you do all of this:
- Prioritize order for responding to tickets, being sure to address the most urgent needs first.
- Increase efficiency by automatically assigning tickets to a particular category or team member.
Just for kicks, we decided to do some analysis on how the four biggest US phone carriers (AT&T, Verizon, Sprint, and T-Mobile) handled customer support interactions on Twitter. We downloaded tens of thousands of tweets mentioning the companies (by name or by handle), and ran them through a MonkeyLearn sentiment model to categorize each tweet as positive, neutral, or negative. We then used our new Insight Extractor, which reads all text as one unit, extracts the most relevant keywords, and returns the most relevant sentences including each keyword.
Here’s some insights:
- T-Mobile had far and away the highest percentage of positive tweets.
- Verizon was the only company with more negative tweets than positive ones.
- Top keywords for positive tweets at Verizon included typical terms such as “new phone,” “thanks,” and “quality customer service.” Key sentences were typical, formal, somewhat dry interactions between the team and followers.
- Top keywords for positive tweets at T-Mobile included names of people on their customer support team, because their team runs higher engagement, back-and-forth about anything type conversations with followers.
To sum up, this could imply that a more personal, engaging take on social media elicits more positive responses and higher customer satisfaction.