Imagine we have 50,000 words and we are feeding them one at a time. The traditional technique would be to one hot encode every word and feed it into your algorithm. But this would mean than we are 49,999 elements of the vector are just zero, this is very inefficient from a computational point of view.

One more thing when we are working with words is that we want our algorithm to understand the relationship between words, i.e. we want the algorithm to understand that red, blue and green are colors or at least understand that they mean something or they have some sort of a relationship with each other.

The solution to both of the above problems is something called as word 2 vec embedding. The concept is based on the fact that similar words appear in similar context. For example

  1. The color of the box is red
  2. The color of the box is blue

In both the above sentence red and blue are describing the color of the box(which is the context).

Next Question would be how would you implement this idea. There are 2 methods in which

  1. continuous bag of words(cbow)
  2. Skip Gram

As shown in figure in cbow we take the context as input and try to predict the desired word as the output.

In skip-gram we take the word as an input but we try to predict the context as output.

In this above cases we are interested in what values does each word get. Below diagram shows the output of a skip-gram I trained. In this we can observe that words with similar meaning appear close to each other(like man and woman, left and right).

It is observed that cbow trains faster while skip gram is better at more accurately creating the vector space.

You can find my implementation of Word2Vec(skip-gram) here: https://github.com/ck2886/Word2Vec_SkipGram/blob/master/Skip-Gram_word2vec.ipynb

Source: Deep Learning on Medium