Asking the Right Questions: Training a T5 Transformer Model on a New task

Original article was published on Artificial Intelligence on Medium

I’ve been itching to try the T5 (Text-To-Text Transfer Transformer) ever since it came out way, way back in October 2019 (it’s been a long couple of months). I messed around with open-sourced code from Google a couple of times, but I never managed to get it to work properly. Some of it went a little over my head (Tensorflow 😫 ) so I figured I’ll wait for Hugging Face to ride to the rescue! As always, the Transformers implementation is much easier to work with and I adapted it for use with Simple Transformers.

Before we get to the good stuff, a quick word on what the T5 model is and why it’s so exciting. According to the article on T5 in the Google AI Blog, the model is a result of a large-scale study (paper link) on transfer learning techniques to see which works best. The T5 model was pre-trained on C4 (Colossal Clean Crawled Corpus), a new, absolutely massive dataset, released along with the model.

Pre-training is the first step of transfer learning in which a model is trained on a self-supervised task on huge amounts of unlabeled text data. After this, the model is fine-tuned (trained) on smaller labelled datasets tailored to specific tasks, yielding far superior performance compared to simply training on the small, labelled datasets without pre-training. Further information on pre-training language models can be found in my post below.

A key difference in the T5 model is that all NLP tasks are presented in a text-to-text format. On the other hand, BERT-like models take a text sequence as an input and output a single class label or a span of text from the input. A BERT model is retrofitted for a particular task by adding a relevant output layer on top of the transformer model. For example, a simple linear classification layer is added for classification tasks. T5, however, eschews this approach and instead reframes any NLP task such that both the input and the output are text sequences. This means that the same T5 model can be used for any NLP task, without any aftermarket changes to the architecture. The task to be performed can be specified via a simple prefix (again a text sequence) prepended to the input as demonstrated below.


The T5 paper explores many of the recent developments in NLP transfer learning. It is well worth a read!

However, the focus of this article on adapting the T5 model to perform new NLP tasks. Thanks to the unified text-to-text approach, this turns out to be (surprisingly) easy. So, let’s get to the aforementioned good stuff!

The Task

The T5 model is trained on a wide variety of NLP tasks including text classification, question answering, machine translation, and abstractive summarization. The task we will be teaching our T5 model is question generation.

Specifically, the model will be tasked with asking relevant questions when given a context.

You can find all the scripts used in this guide in the examples directory of the Simple Transformers repo.

The Dataset

We will be using the Amazon Review Data (2018) dataset which contains (among other things) descriptions of the various products on Amazon and question-answer pairs related to those products.

The descriptions and the question-answer pairs must be downloaded separately. You can either download the data manually by following the instructions in the Descriptions and Question-Answer Pairs below, or you can use the provided shell script. The list of categories used in this study is given below.


  1. Go to reviews URL.
  2. Download the metadata files (json.gz) from the links on the page. Note that it might be better to download from the Per-category data links (e.g. rather than downloading the full metadata for all products. The full metadata is a 24 GB archive and you will need a lot of RAM to process it.
  3. Rename meta_ALL_Beauty.json.gz to meta_Beauty.json.gz to match the name in the question-answer file.

Question-Answer Pairs

  1. Go to the qa URL.
  2. Download the Per-category files. Note that I am using the question-answer pairs without multiple answers.

Shell Script

Alternatively, the shell script below should download all the necessary files by reading the links from the two text files also given below (place the text files in the same directory data/ as the shell script). It will also rename meta_ALL_Beauty.json.gz to meta_Beauty.json.gz to match the name in the question-answer file.

Links to the meta JSON files
Links to the qa JSON files
Shell script to download the JSON files

With the data files in place, we can start training our model!


We will be using the Simple Transformers library (based on the Hugging Face Transformers) to train the T5 model.

The instructions given below will install all the requirements.

  1. Install Anaconda or Miniconda Package Manager from here.
  2. Create a new virtual environment and install packages.
    conda create -n simpletransformers python pandas tqdm
    conda activate simpletransformers
    conda install pytorch cudatoolkit=10.1 -c pytorch
  3. Install Apex if you are using fp16 training. Please follow the instructions here. (Installing Apex from pip has caused issues for several people.)
  4. Install simpletransformers.
    pip install simpletransformers

See installation docs

Data Preparation

We can process the data files and save them in a convenient format using the script given below. This will also split the data into train and evaluation sets.

Adapted from the helpful scripts given in the Amazon Review Data page.

Check whether you have the train_df.tsv and eval_df.tsv files in your data/ directory.

Training the Model

Data Formats

The input data to a T5 model should be a Pandas DataFrame containing 3 columns as shown below.

  • prefix: A string indicating the task to perform.
  • input_text: The input text sequence.
  • target_text: The target sequence.

Internally, Simple Transformers will build the properly formatted input and target sequences (shown below) from the Pandas DataFrame.

The input to a T5 model has the following pattern;

"<prefix>: <input_text> </s>"

The target sequence has the following pattern;

"<target_sequence> </s>"

The prefix value specifies the task we want the T5 model to perform. To train a T5 model to perform a new task, we simply train the model while specifying an appropriate prefix. In this case, we will be using the prefix ask_question. I.e. All the rows in our DataFrame will have the value ask_question in the prefix column.


Training the model is quite straightforward with Simple Transformers.

As you might observe from the training script, we are using the t5-large pre-trained model. Using these parameters with the t5-large model takes about 12 hours of training with a single Titan RTX GPU. Depending on your GPU resources, you can either increase the train_batch_size to speed up training or you can decrease it to fit a GPU with less VRAM (Titan RTX has 24 GB).

Note that you can offset the effect of a small batch size by increasing the gradient_accumulation_steps. The effective batch size is roughly equal to train_batch_size * gradient_accumulation_steps.

You can also significantly improve the training speed and GPU memory consumption by opting for the t5-base model. This will likely result in comparatively worse (but by no means poor) model.

This training script will also automatically log the training progress using the Weights & Biases framework. You can see my logs here.

Evaluating the Model

Evaluating a language generation model is a little more complicated than evaluating something like a classification model. This is because there is no right answer you can compare against like you could with a classification model. The evaluation dataset contains descriptions and the questions that people have asked about those products, but that doesn’t mean that those are the only right questions you can ask.

Therefore, one of the best ways to evaluate a language generation model is to generate text and have it evaluated by an actual person (or several people).

Speaking of generating text, impressive developments in decoding algorithms over the past few years has led to models capable of generating quite realistic text sequences. (Decoding algorithms are used to generate text)

The following section gives a brief overview of the popular decoding algorithms currently in use.

Decoding Algorithms

This section is based heavily on the Hugging Face notebook on text generation. I highly recommend going through that notebook to gain a more in-depth understanding of decoding algorithms as it does an excellent job of explaining the algorithms and showing how they can be used.

  1. Greedy search — Selects the word with the highest probability as the next word at each timestep. The T5 paper uses this algorithm for short sequence generation (e.g. classification).
  2. Beam search — Tracks the n most likely hypotheses (based on word probability) at each timestep and finally chooses the hypothesis with the highest overall probability. ( n is the number of beams)
  3. Top-K sampling — Randomly samples a word from the K most likely next words at each time step. The number of possible words to choose from at each step is fixed.
  4. Top-p sampling — Samples a word from the smallest possible set of words whose cumulative probability (sum of probabilities for each word) exceeds the probability p at each timestep. The number of possible words to choose from at each step is dynamic.

We will be using a combination of both Top-K and Top-p sampling techniques to generate questions with our T5 model. This strategy typically leads to more natural-looking text.

Question Generation

The predict() method of a Simple Transformers T5 model is used to generate the predictions or, in our case, the questions.

Here, we are generating 3 questions for each description in the eval_df dataset.

Let’s take a look at some of the samples.

Just for fun, I’ve shuffled the generated questions with the actual question from the dataset. There are 4 questions for each description, 3 of which are generated and one is the original. See if you can tell which is which! I would love to see your guesses in the comments. 😉

Sample 1


The Smart Solar San Rafael II Solar Mission Lantern will provide elegant ambiance to any outdoor setting and is ideal for your patio, deck or garden: made from all-weather poly-plastic with a seeded glass effect, the 15-inch lantern sits on any surface, or can be hung using the integrated hanging loop. The Rafael II is illuminated by two warm white LEDs in the top with a pillar candle inside the lantern that has an amber LED for a warm glowing effect. Powered by an integral mono-crystalline solar panel and rechargeable Ni-MH battery, the Rafael II requires no wiring or operating costs. The lantern automatically turns on at dusk and off at dawn. Smart Living Home & Garden offers a 1 year limited manufacturers warranty from the original date of purchase on full products bought from authorized distributors and retailers. Established in 2002, Smart Solar offers a wide selection of exclusively solar powered products. We design, manufacture, and customize all of our own items for your patio and garden. Enjoy our solar powered, energy efficient, and environmental friendly lighting solutions, water features, and outdoor decor. We are confident you will love solar living — that’s why we’ve been creating solar products and growing the solar lifestyle for nearly 15 years.


  • what is the height from the ground to the LED bulb? thanks
  • What kind of battery does the pillar candle use?
  • What size bulbs does it take?
  • Are they heavy? We get a lot of wind and they will be on tables

Sample 2


Durable dog ball with treat hole


  • will it be safe for a pug to play with it?
  • does it squeak or not?
  • will it pop when the dog chews on it
  • What is the weight of this item?

Sample 3


Petco River Rock Shallow Creek Aquarium GravelPetco Aquarium Gravel is ideal for freshwater and safe for marine aquariums. This high quality gravel has colorful, durable coatings specifically developed for their permanence and non-toxicity. The gravel is processed to remove potentially harmful debris and materials. It will not affect the water’s chemistry, nor harm any fish, invertebrates or plants. Can be used in aquariums, ponds, water gardens and terrariums.


  • I have a betta fish with very delicate fins. I want to make sure I get gravel that’s not going to scratch or tear them. Would this stuff work?
  • Has anyone tried this in saltwater, if so how does it hold up?
  • What are the dimensions of the bag and the plastic/stuff it comes in?
  • Is this gravel good for growing algae in aquariums?

Sample 4


Enter a world of building fun with the LEGO City Starter Set featuring 3 iconic vehicles. Catch the robber with the policeman on his motorcycle! Put out the fire with the firemans speedy fire truck. Then race to help the fallen skater boy in the ambulance. Create endless play possibilities with all the inspiration a young builder needs to explore fun ways of saving the day! Includes 5 minifigures with accessories: robber, policeman, fireman, rescuer and a skater boy. 272 pcs. Ages 5 yrs. +.’, “Enter a world of building fun with the LEGO City Starter Set featuring 3 iconic vehicles. Catch the robber with the policeman on his motorcycle. Put out the fire with the fireman’s speedy fire truck. Then race to help the fallen skater boy in the ambulance. Create endless play possibilities with all the inspiration a young builder needs to explore fun ways of saving the day. Includes 5 minifigures with accessories: robber, policeman, fireman, rescuer and a skater boy.


  • How much space in the box will the starter set take up when all the pieces are in the set?
  • What size are the blocks themselves?
  • Can Lego minifigures be made to fit on the LEGO HOMES?
  • What color is the set? The picture is not clear and looks dark.

Sample 5


Elegant and sleek, this TV Stand a new look to your home. Finished in a dark Espresso color. Two sliding doors. Four Sections for storage.


  • Is there any way to adjust the height in this unit or does the width need to adjust itself if my TV is not 32″?
  • How tall are the shelves? I have a tall receiver and want to be sure it will fit.
  • What are the dimensions of the two storage compartments? Thanks!
  • Can the drawers be removed or are they fixed?

Sample 6


Did we say cotton? You bet we did. The Men’s Charged Cotton Longsleeve T-shirt may feel like a regular cotton T-shirt, but it’s anything but ordinary. Its unique fabrication combines the classic comfort of cotton with the built-in water-resistance of all-weather gear to create the world’s first true performance cotton T-shirt. It feels soft but dries faster than regular cotton, so you’ll never be weighed down. Lightweight comfort. Stretchable mobility. This is the most powerful cotton T-shirt you’ll ever put on. After all, it is Under Armour.


  • can you get a large in this for me?
  • will this work for running/doing sprints? i have small arms and not alot of flexibility but do alot of sprinting. would a vrs2 look ok on me
  • what is the shirt size for a 12 year old boy?
  • Size large is what chest size in inches?

Bonus sample


Connect the Dotters!’, “Connect the Dotters! Dotters, our 10 happy-faced Dalmatian dog, is made of our super-soft Pluffies material that’s not only cuddly, but machine washable!