Practical Applications of Open AI’s GPT-2 Deep Learning Model

Source: Deep Learning on Medium

1. Text Generation ✍🏻

We can use the GPT-2 model to generate long texts. Like traditional language models, it outputs one token (aka word) at a time. This output token can be added at the end of input tokens, and then this new sequence will act as an input to generate the next token. This idea is called “auto-regression”.

gif source [4]

GPT-2 is a very large language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. Due to the diversity of the training dataset, it is capable of generating conditional synthetic text samples of unprecedented quality. Given an arbitrary text as input, the model is capable of generating long texts that are very close to human-level accuracy.

According to OpenAI’s blog

The model is chameleon-like — it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing.

You can play around with the GPT-2 model at Talk to Transformer website 🔥. The official code of GPT-2 is available at OpenAI’s Github repo.

So far we have talked about generating text using the original GPT-2 model. We can also fine-tune the GPT-2 model on our datasets to generate custom texts. Neil Shepperd has created a fork of OpenAI’s repo which contains additional code to allow fine-tuning the existing OpenAI model on custom datasets. Here is a colab notebook where you can fine-tune the 117M and 345M variant of GPT-2 using this fork.

After the release of the training code, developers started sharing their own GPT-2 generated texts after fine-tuning it on various datasets. Researchers such as Gwern Branwen made GPT-2 Poetry and Janelle Shane made GPT-2 Dungeons and Dragons character bios!

Keaton Patti shared on twitter, how he trained an AI on 1000 hours of Batman movies. He also tweeted the first page of the movie script generated by the AI after training. Justin Davis has recorded a really cool audio version 👌🏻 of the script generated by the Keaton’s bot.

Yep. GPT-2 was found capable of doing all of it, writing poetry, movie scripts, and even video game character bios. Imagine the potential it holds in multiple industries.

In the paper Fine-Tuning Language Models from Human Preferences, OpenAI has described how pre-trained language models can be fine-tuned with reinforcement learning rather than supervised learning, using a reward model trained from human preferences on text continuations. For stylistic continuation of input text, 5000 human comparisons (each choosing the best of 4 continuations) result in the fine-tuned model being preferred by humans 86% of the time vs. zero shot.

2. Chatbots 🤖

Another great application of GPT-2 is the conversational AI. Before the rise of deep learning-based NLP techniques, it used to take months to design the rules and cover the conversation topics for the chatbots. Now with the help of transfer learning and language models like GPT-2, we can build really good chatbots in a matter of days.

Thomas Wolf (from HuggingFace), in his blog explained how they fine-tuned the GPT-2 model to build a state of the art dialog agent with a persona. Their team fine-tuned the GPT-2 on the PERSONA-CHAT dataset. This dataset basically has the conversations of randomly paired people.

The paired workers were asked to chat naturally and to get to know each other during the conversation. This produces interesting and engaging conversations that learning agents can try to mimic.

The dialog agent has a knowledge base to store a few sentences describing its personality and to store dialog history. Whenever a new utterance is received from the user, the agent combines the utterance with the knowledge base to generate the response.

🌊Here is the demo of the chatbot.

3. Machine Translation 👂🏻

OpenAI has published another paper in which it is described how they tested the performance of their models on various natural language tasks using zero-shot domain transfer.

As explained in this blog, “The zero-shot learning method aims to solve a task without receiving any example of that task at the training phase”

To help the model to infer the task of translation, the language model is conditioned on the example pairs of the format- “english sentence = french sentence”. Now, to get the translation of an English sentence, the inputs to the model are given in the form- “english sentence =”. Then samples are generated from the model using greedy decoding and the first generated sentence is used as a translation.

image source [4]

4. Text Summarization 🚀

Since GPT-2 is a seq2seq model, it can also be fine-tuned for the task of text summarization. Here the format of data is very similar to what we saw in the translation task- “text = summary”.

The original paper describes how GPT-2’s ability of summarization was tested using the zero-shot task transfer. It was first tested on the CNN and Daily Mail dataset. To induce summarization behavior, the text TL;DR: was added at the end of the input texts and the model was configured to generate 100 tokens with top-k random sampling with k=2. The low value of k helps in reduced repetitiveness and encourages more abstractive summaries.

image source [4]

In the paper Fine-Tuning Language Models from Human Preferences that I talked about earlier, it is shown how the GPT-2 774M model was fine-tuned to summarize texts according to human preferences. The model was trained by combining supervised fine-tuning with human fine-tuning on 60k labels. As a result, the summaries from the supervised fine-tuned version of GPT-2 are more novel as measured by n-grams or sentences, they are also more novel in terms of content. That is, they’re not just copying from the input text.

Such functionality can be used to summarise a wide variety of content, from research papers to news media.


Isn’t it fascinating, how a single underlying architecture can perform multiple tasks, from machine translation to writing poetry with clever fine-tuning? And when such models are released in Open source, it becomes a developmental phenomenon, as you can see how several open-source developers and innovators demonstrated new ways of using it.

We are just at the dawn of the age of AI. Models like GPT-2 are proving how robust computers can perform human-like tasks. With the current pace of technological advancements, AI might soon be a part of every aspect of our life. Perhaps, it already is.

I hope you found this post informative and insightful 👏🏻 🙏🏻.