Language Prediction & Text Generation

Original article was published on Artificial Intelligence on Medium

Language Prediction & Text Generation

How does your favorite search engine complete your search queries? How does your phone’s keyboard predict what’s your next type? Language prediction is an application of NLP concerned with predicting text. Autosuggest, autocomplete, and suggested replies are common applications of language prediction.

Your first step to language prediction is picking a language model. Bag of words alone is generally not a great model for language prediction because no matter what the preceding word was, you will just get one of the most commonly used words from your training data.

If you go the n-gram route, you will most likely rely on Markov chains to predict the statistical likelihood of each following word or character based on the training collection. Markov chains are memory-less and make statistical predictions based entirely on the current n-gram on hand.

For example, let’s take a sentence beginning, “I ate so many cheese”. Using a trigram model (where n is 3), a Markov chain would predict the following word as “burgers” based on the number of times the sequence “cheese burgers” has appeared in the training data out of all the times “cheese” has appeared in the training data.

A more advanced approach, using a neural language model, is the Long Short Term Memory (LSTM) model. LSTM uses deep learning with a network of artificial “cells” that manage memory, making them better suited for text prediction than traditional neural networks.

The following songs are giving for training the model

document1: Look What You Made Me Do – Taylor swift

document2: Right now, na na – Akon

document3 : Baby – Justin bieber

Generated text:
i m all gone yeah now na i think about is in the street and now na na i m in your little games don t trust nobody came between us two and threw a beat when i got harder in the old taylor can t cool no i got
Photo by Markus Spiske on Unsplash

There are a vast array of applications for NLP. However, as they say, “with great language processing comes great responsibility” (or something along those lines). When working with NLP, we have several important considerations to take into account.

There are many powerful NLP tools that come head-to-head with privacy concerns. Who is collecting your data? How much data is being collected and what do those companies plan to do with your data?

You can have the best intentions and still accidentally program a biased tool. While NLP can limit bias, it can also propagate bias. As an NLP developer, it’s important to consider biases, both within your code and within the training corpus. A machine will learn the same biases you teach it, whether intentionally or unintentionally.