Original article was published on Artificial Intelligence on Medium
Recent Google and Facebook focused on behind-the-scenes mechanisms of text prediction. In addition to using Recurrent Neural Network and Long Short-Term Memory Networks for the motivation, there were two word2vec models for generating word embeddings also discussed.
Exploring the evolution of deep learning, I exemplified some of the areas that are applied only within Google as one of the major prodigies at the playground. So, thanks to deep learning.
There are two ways of developing the word2vec model: continuous bag-of-words (CBOW) and skip-gram.
To understand more in detail, first, let’s have a look into CBOW and Skip-gram model.
Continuous skip-gram & bag-of-words
In terms of underlying algorithms, both models are similar, slightly differing only in the approach: CBOW predicts target words from source context words, while skip-gram does the inverse and predicts source context-words from the target words. Simply put, the CBOW model uses surrounding words to predict the one of interest, while the skip-gram model uses the central word to define the surrounding ones.
Other minor differences include an absence of a hidden layer in the skip-gram architecture and no strict rules for word order in CBOW. According to this TensorFlow tutorial, CBOW is more useful for smaller data sets, while skip-gram is more essential for larger ones.
Implementation steps for text generation:
1.Cleaning input data
2. Building vocabulary
3. Converting text to word vectors
4. Defining the model (encoder-decoder)
5. Training the model
6. Generating text
The above-mentioned step by step process will lead us to our end goal of text prediction. Cleaning data involves spans tokenization, lemmatization, and stemming. For converting text to word vectors, we need to consider the following steps :
- all the words must be included in the vocabulary.
- designing a unique mapping of each word to an index
So, the word mapping technique converts the input text into word vectors.
When defining the model, the following preconditions must be satisfied:
- using a word embedding to embed the input into 2D arrays
- using a decoder (with an encoder) comprising RNNs and LSTMs to make predictions
- applying some fully connected layers on top of the decoder output to generate the actual predictions
Applying LSTM, in training the model achieves the input time step, predicts the class of the next word.
You can find the source code for each of the steps in Sangramsingkayte