The Current State of the Art in Natural Language Generation (NLG)

Original article was published on Deep Learning on Medium

The Current State of the Art in Natural Language Generation (NLG)

Natural Language Generation (NLG) deals with automatic generation of natural language text based on contexts provided as inputs. NLG sees widespread use in cases like automatic summarization models (models that generate succinct summaries of long write-ups), story/poem generation models that can generate new stories and poems, generation of captions for images using an NLG model and so on.

State-of-the-art techniques in NLG:

Markov Chains: Markov chains are among the earliest algorithms used for language generation. They predict the next word in a sentence by just using the current word. For example, if a model was trained using only the following sentences: “I drink coffee in the morning” and “I eat sandwiches with tea”. There is 100% chance it would predict “coffee” to follow “drink”, while there is 50% chance for “I” to be followed by “drink” and 50% to be followed by “eat”. A Markov chain takes the relationship between each unique word into consideration to calculate the probability of the next word. They were used in earlier versions of smartphone keyboards to generate suggestions for the next word in the sentence.

RNN’s and LSTM’s: Vanilla RNN’s and LSTM networks can be used for natural language generation, in a task similar to language modelling (predicting the next word in a sentence given the previous words as input), albeit in an unsupervised way, producing text by themselves.

In every iteration of the RNN, the model stores in its memory the previous words encountered and calculates the probability of the next word. For example, if the model generated the text “We need to rent a ___ ”, it now has to figure out the next word in the sentence. “House” or “car” will have a higher probability than words like “river” or “dinner. For every word in the dictionary, the model assigns the probability based on the previous words it has seen. The word with the highest probability is selected and stored in the memory, and the model then proceeds with the next iteration.

Transformers: Recently, Transformers have also been used for language generation. One of the most well-known examples of transformers used for language generation is by OpenAI, in their GPT-2 language model. The model learns to predict the next word in a sentence by using attention to focus on the words previously seen in the model that are relevant to predicting the next word.

Relationships determined by the self-attention mechanism in transformers:

The T-NLG model: In February 2020, Microsoft announced a new model called Turing Natural Language Generation (T-NLG). This was reported as the world’s largest transformer language model ever created. T-NLG has 17 billion parameters. To put this into perspective, a GPT-2 model by OpenAI with 1.5B parameters was considered the biggest in 2018. T-NLG has over 10 times as many parameters as OpenAI’s GPT-2. This plot below illustrates a comparison between parameters counts of several recent models.

T-NLG is a Transformer-based generative language model, which means it can generate words to complete open-ended textual tasks. In addition to completing an unfinished sentence, it can generate direct answers to questions and summaries of input documents.

There are numerous software and hardware challenges that Microsoft has overcome with this breakthrough model, including a fast connection between the dozens of GPUs that’ll probably be used in the process. Turing NLG’s results on a standard NLP task using the pre-trained model are charted below.

In the LAMBADA dataset, it measures the next word prediction using accuracy, the higher the better. The WikiText-103 uses perplexity to measure a probability distribution, a lower number indicates the probability distribution is good at predicting the sample.

There was an ImageNet moment for Deep Learning in Computer Vision in 2012. In 2019, it was a similar moment for NLG where the revolution began as we saw the unsupervised models like GPT-2 and T-NLG that made significant breakthroughs in various natural language generation tasks. The golden era for NLG has only begun.