DeepLearning series: Sentiment Classification

Great use of word embeddings (see previous blog) is done by a sentiment classification application.

It entails the machine to “read” a piece of text and deduct the intention of the writer; if he/she likes or not what he/she is talking about.

One challenge is the fact that we might not have a massive training set, but luckily, with word embeddings, we can build a good sentiment classifier even with a small size labeled training set.

One implementation can be done through a simple model. We take each word and, as usual, we create a one-hot representation, multiply it by the embedding matrix E, which has learned from a large text corpus, and used that to extract the embedding vector for each word.

We then average out (or sum) all the embedding vectors and run that through a Softmax which spits out the probability of the outcomes, which, for example, could be the star ratings from 1 to 5.

The problem with this simple model is that it ignores the word order. While it considers the sum or average, it will fail to recognize when a review is negative.

An example could be:

“The restaurant was lacking in good service, good ambiance, good food and good price”. The above model counts the number of times the word “good” appears and it translates that sentence as a positive review. We know better, though!

Instead of using the average or sum vector, we can use an RNN.

This will have the many-to-many architecture, and it will be able to “listen” to the whole sentence before spitting the output.

Debiasing word embeddings

We have seen how the embedding matrix is trained on a big text corpus, which ultimately defines the features for each word. The way to do this is to run a lot of text taken from the internet.

We can predict how the machine, like a toddler, might learn from this big corpus, the “adults”, many things, together with some bias.

Luckily researchers have been working on fixing these biases in word embeddings, with some success.

Let’s take for example the gender bias and see what tools we have to de-bias this from words.

1- Identify the bias direction:

For the gender bias, we take the embeddings of the vector for “he” and subtract the embeddings vector for “she”, because the only difference between the two words is the gender.

We do this for a few more pairs of words where the only discriminant is the gender (i.e., “male”-“female”, “boy”-“girl”…) and average them out.

This will help us in figuring out the direction of bias, such as:

Therefore, the bias direction will be a 1-dimensional subspace, while the non-bias one will be a 299D space.

There is another, more complicated way, to find the direction (higher than 1-dimensional) and it involves the use of the SVU (singular value decomposition) algorithm, which uses an idea similar to the PCA (principal component analysis) used in machine learning for feature selection. (link here)

But let’s keep things simple, and move on to the next step.

2- Neutralize:

As we can see from the image above, in some words, called “definitional words”, the gender is intrinsic (i.e., girl, boy, ….), while others, instead, we want them to be neutral (i.e., doctor, babysitter, …)

To neutralize the gender from those we have to reduce the component in the bias direction and therefore project the vector to the non-bias axis.

3- Equalize pairs:

For pairs of definitional words such as boy/girl, grandmother/grandfather, we want their difference in their embeddings to be just the gender. Therefore the distance between them and other words should be the same.

Finally, how do we know which words to neutralize? Some researchers (paper: trained a classifier to try to figure out the words that are definitional and the ones that are not. In the English dictionary most of the words are not definitional, so the word pairs we need to equalize is a relatively small subset, so it’s feasible to hand-pick and to adjust them.

This blog is based on Andrew Ng’s lectures at

Source: Deep Learning on Medium