When Word Embeddings Go Berserk.

Original article was published by mitra mirshafiee on Artificial Intelligence on Medium

When Word Embeddings Go Berserk.

As I was going through Andrew Ng’s Sequence Models course, I came across a paper that was called “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”. Because the topic was really attractive and the explanation Andrew Ng gave too short, I decided to go and explore it myself.

After searching through the internet, reading a couple of papers I put together a list of a story like a tutorial that I want to share with you today.

First: machines are like babies

cute little robot

Just like how kids watch their parents and then imitate their behavior, machines and algorithms are watching us to figure out how they should behave themselves. So if we put ourselves on fire, jump out of the window and land two floors below, knock someone’s head off and then smile after, they’ll probably think that we enjoyed the process and found them alright and virtuous deeds.

This is exactly the case(but a little bit milder) with how computers are learning from our previous mindsets no matter how old and wrong they were, and then replicating them into our modern society. The old and previous mindsets which are discussed in this paper are racial discriminations, gender biases and in general, beliefs that pigeonhole people into different categories with specific abilities and therefore preventing them from aspiring for new skills, careers, and even lifestyles.

Second: Babies are our next generation so take care of them

As we go out into society and take on different roles, do our best to show our parents that we’ve learned what they’ve taught us well, machines try to show us how they can perform as well as we do. So let’s assume that we want to teach a robot how to interact with other people. When we train it on a large corpus of text from Google News, Wikipedia, and all the other resources that contain text from all over the world, it takes in as many nuances as it can for each word in its dictionary. So when we put our robot in a room with a white man and a woman and an African American guy, it may as well hand the woman a laundry detergent and the man a beer and guide the African American, to the plain fields for harvesting cotton.

Third: I don’t care about my baby, so what?

You may say things like, “come on! what matters if it gives the woman a detergent?! so she puts it down and grabs a beer instead.”, or “chill it’s just a robot so what?”.

Well, the first thing that we have to keep in mind is that machine learning algorithms are actually taking the world and also all the systems we have and use daily, like advertisements, social media, job search, and so on. They not only replicate our old thoughts and thinking patterns but they amplify them by making the biased mindsets more prevalent.

So if we don’t take care of our algorithms they would continue labeling dark-skinned people as Gorillas, show female computer programmers and mathematicians cleaning and domestic job offers, think that all Muslims support terrorism, and so on.

Forth: a brief history of how we created our babies. (don’t get me wrong, algorithms and machine-learning babies)

Back then in 2013, big brains and minds of google like Tomas Mikolov and his team were trying their best to make their search algorithms answer people’s questions better. So after combining some of the old techniques they invented an algorithm called word2vec that falls into a category of algorithms called autoencoders. Autoencoders are generally like compressors, they take in one definite representation of our data, like the picture of an object, and then outputs another representation that has the main aspects and features of the input we gave it. They consist mainly of three parts, encoder, latent space or bottleneck which has the compressed data, and a decoder that tries to bring the compressed data back to its original format as much as possible. more of that in another article, but for this tutorial, all you have to keep in mind is that autoencoders are unsupervised deep learning algorithms that keep the most important aspects of the input and try to return an output that corresponds as much as possible with the input. And this is actually why they were traditionally used for dimensionality reduction but nowadays they’ve evolved and are used in all different fields of machine and deep learning.


After understanding the basic structure of autoencoders, we can understand Word2Vec so much better. Because Word2Vec is an autoencoder, it is an unsupervised algorithm that can learn about the most important aspects of each input all by itself and without having predefined labels. Therefore, when we give it a sentence, for each word of that sentence, it outputs a series of numbers representing the most important aspects of those words called word embeddings. Like when we give it this sentence, “ a cute cat and dog”, it will generate five different vectors which values are inferred from the sentence itself. And also they are in numbers because computers can only understand digits and not anything else. (learn more about word embeddings in here)

What could embeddings do?

And with all these cool embeddings at hand people could do a lot of things. They could recommend products base on each person’s searches, find hidden connections between different interests of each person, and even they could sometimes play with the algorithms by for example asking analogy questions.

And the way computers could answer these questions is quite easy. We know that each word has its own embedding and each embedding consists of a series of numbers which is called a vector. For the sake of visualization, let’s assume that our word embeddings are only three dimensional and only represent three aspects of each word.

How we form analogies

When we ask it “man” is to “woman” is as “girl” is to what, it subtracts the first pair of vectors to get the difference between their meanings and then add this difference vector to the vector of “boy” and tries to find the closest vector to what is calculated.

So this is when it gets interesting. As scientists were having fun asking the machine different analogies and using them to infer meaning from text, they came across one analogy that got their attention. It was “man” to “computer programmer” is like “woman” to what?

Before seeing the answer they expected something like a programmer or coder. But interestingly, the machine answered like a man living in his old 70’s thinking the computers are sent from Gods above and can only be operated by the selected prophets who are male God sents. So it answered, “Homemaker”.

After exploring a little bit more, they found other gender-neutral words that had vectors closer to one specific gender. Below is a list of these words.

Some gender inclined words

For examining this speculation further, and seeing whether or not gender pairs had anything in common, they chose 10 different gender pairs like (she, he) and calculated their difference and then calculated the PCA of all these differences(the right-hand side blue graph shows the top 3 principal components). They also randomly chose 10 pairs of words for 1000 times and calculated the average of their difference and then their principal components of(the left-hand side graph shows the top 3 principal components of their differences).

In the left plot, It’s evident that the first component is significantly bigger than the other 2. On the other hand, the PCA of 10 random pairs has a more gradual decay and by this, we can conclude that the difference in genders is something concrete and can be pointed out as a feature of all these pairs. So having these directions at hand, we can take the difference in genders out of the gender-neutral words and neutralize them with respect to gender. So for taking out all the bias and differences from the words, they first needed to project each word onto the gender difference subspace to identify the part of the meaning that contains this bias.

Two main steps for debiasing

Step 1: Identifying gender subspace

W: all word sets
Di: each word set
µi: mean of each word set
B: the bias subspace (first 3 rows of C)

Step 2a: Neutralizing and Equalizing

a quick definition of neutralizing and equalizing

After finding the biased features and direction, we have to identify which aspects of each word is related to the gender subspace or (direction) and then take it out of our gender-neutral words. First for identifying the bias in each word we find the projection of its word vector onto the gender subspace. If you’re not familiar with the math read the following:

Projecting a vector into a subspace: Let’s say you want to project a vector v onto a subspace B. What you do, is that you find the closest vector to v in subspace B. So as a simple example, if v be in the subspace, it’s projection will be itself. As a general definition we project v onto B by the equation below:

See the graph below for a better intuition:

If we assume that the spans of our subspace are vectors in “plane_spans” and the vector we want to project onto it is “vector”. By calculating the dot product of “vector” and spans, we get the magnitude of each direction of the projection of “vector” and then by multiplying it with the span we’ll get a vector in the same direction as the span. Here’s a simple python code for calculating the projection:

  1. Neutralize: Taking out the bias direction from all the gender-neutral words. For this step, we calculate the projection of each word in N(a group of words that have to be neutralized like ‘computer programmer’, ‘baby sitter’, etc.) onto the bias subspace(B). Then the projection will be subtracted from the word vector and output the neutralized word embedding.
    N: words to neutralize
    wʙ: projection of each word vector onto B(gender subspace)
  1. Equalize: Making sure that every set of words(equality sets) has the same distance to the gender-neutral word. For example, if {‘grandmother’, ‘grandfather’} and {‘girl’, ‘boy’} were two equality sets, after equalizing, both words in both sets will be equidistant to the word ‘babysit’.
    ε: a set of all equality sets
    E: equality set
    µ: the mean of each equality set

Equalize equates each set of words outside of B to their simple average, v. As the authors of this paper suggest, it’s easier to look at it separately as and the w that’s perpendicular to B and equal to w-wʙ. All the word embeddings in those equality sets are equated to their average(v) and within B they are centered and therefore moved to mean 0(wʙ-µʙ) and then scaled (by being multiplied by the equation under the radical and the denominator of the division) so they all have unit length.

why word vectors are centered within the bias subspace.

Well, let’s assume that we have a pair like {‘female’, ‘male’} and we want to equalize their distance to the word programmer. It so happens that when we project both of these words onto B, ‘female’ has a larger projection and therefore gender component. So after centering these words in the bias subspace, they become symmetrically balanced.
As the authors tell us: If instead, we simply scaled each vector’s component in the bias direction without centering, male and female would have exactly the same embeddings and we would lose analogies such as father:male:: mother: female.

After neutralizing and equalizing

After the previous two steps, we can observe that for every word (like e1, e2) in an equality set(E), is equally distant to every gender-neutral(w). Or to put it more elaborately in mathematical terms:

Step 2b: Soft debiasing

In the last step, by defining a linear transformation, we tend to minimize the projection of each gender-neutral word onto the gender subspace, while maintaining the similarity between all the word vectors and the transformed version of them. In another term, we try to come up with a transformation that can have the most effect on gender-neutral words but not all the embeddings in the W. So for presenting such intuition in mathematical forms, we use the dot product and write the equation below:

The larger the λ, the more important the second part of this equation will be and therefore, the more we will focus on minimizing the projection of N onto B. With large values of λ, we can exactly get a result equal to the hard debiasing step. Also, in the end, we normalize the vector T, so it has unit length.

How on earth can we distinguish between all gender-specific words and gender-neutral words in a massive dataset?

In this paper, the authors used the Google News dataset with 3 million words. That’s a lot of words and they had to separate all the gender-neutrals like ‘programmer’, ‘baby sitter’, etc. from gender-specifics like ‘businesswoman’, ‘actor’, etc. so they could remove the bias only from them. Because gender-specific words are much less than the other group, it’s much more efficient to specify some of them and then say whatever isn’t gender-specific, will be gender-neutral.

To make the process easier, they first specified around 200 gender-specific words of a subset of Google News vocab with the size of about 26,000 words and then, trained a linear SVM to figure out the rest of the gender-specific words by itself. At the end, in total, they had around 6,500 gender-specific words on their hand.

Result of Debiasing

In the end, after experimenting with both hard debiasing and soft debiasing, they again generated pairs that were analogous to the (he, she) direction and asked people to tell whether or not they think the pairs are gender stereotypes. For both methods, the number of pairs identified as stereotypes decreased significantly.(For example, after hard debiasing word2News the percentage of pairs recognized as stereotypes, shrunk from %19 to 6%)