Natural Language Generation from Structured Data

Source: Deep Learning on Medium

Natural Language Generation from Structured Data

Dileep Pasumarthi and Daljeet Virdi

University of Illinois Urbana Champaign

Developed: September 2019

Over the past few years, rapid improvement in SOTA general language models (BERT, GPT-2, XL-Net, etc.) from new neural architectures (LSTMs, Transformers, etc.) and different pre-training techniques (MLM, bi-directionality, etc.) has led to human level abilities over a wide range of natural language processing tasks (question answering, translation, text to speech, etc.). These developments promise the same wide-ranging impact on Natural Language Processing as ImageNet had on computer vision, ushering in a new era of autonomy in automotive (self-driving car), retail (cashier-less checkouts), financial services (identity verification), healthcare (radiology), etc..

In this article, we explore a subfield of NLP, natural language generation, and one technique to generate text from structured data (data-to-text synthesis). Potential applications include auto-generating news articles, weather reports, industry reports, and business insights. These pieces can be hyper-personalized by local, context, and reading style. For example, data from babies in neonatal care can be converted into text differently, with different levels of technical detail and explanatory language, depending on whether the intended reader is a doctor, a nurse or a parent (Mahamood & Reiter, 2011). Different sport reports are generated for fans of the respective teams; the winning goal of one team is likely to be considered a lucky one from the perspective of the losing team. A human journalist would not dream of writing separate reports about a sports match, but for a computer this is not an issue and this is likely to be appreciated by a reader who receives a more personally appropriate report.

We write a program to create narrative summaries from financial time series data. Financial news means different things to different people, based on an individual’s portfolio. This research can usher in a new class of financial news generation, moving from one news article for everyone to one news article per recipient, with direct application in wealth management, financial reporting, and investor relations.

An NLG system involves three processes:

1. Content determination — what you’re going to say

2. Sentence planning — how are you going to say it

3. Surface realization — which specific words to use, you could call this style or flow.

Developers traditionally implement these three stages and assemble them in a pipeline. Each piece requires codifying rules in templates and significant engineering effort. Most natural language applications are first built in this rule-based way.

We are amidst a massive overhauling of these applications to neural based architectures as deep learning NLP research develops. For example, up until ~2015, production grade spell checkers were roughly 2000 lines of code. With neural probabilistic approaches, modern spell checkers can be as little as ~17 lines of code. The complexity of these systems has shrunk in some ways, less rote rule based programming, but grown in others, as new questions arise: how accurate, precise, bias are these models?

Most modern generalized language models create coherent natural language (see: GPT-2, BERT, etc.) but practitioners are warned to not use them for production use cases because they cannot distinguish “fact from fiction”. The text may sound natural but is often not accurate or coherent.

We ‘fine tune’ these models, training them specifically to be precise and factual when generating text from structured data. Our approach is based on two key improvements to SOTA generalized language models:

  1. A copy-mechanism to select and copy content from structured data
  2. General language model (GPT-2, BERT, XL-NET) to compose coherent sentences.

We use the pre-trained domain independent language model, GPT-2, and train it to learn a copy-mechanism using just a few (< 200 examples) training examples.

Wikipedia articles about people often come with a table of structured data and the first few sentences in the article narrate that table. With 200 Wikipedia articles about people, we fine-tune GPT-2 to write sentences about people that sound like the sentences written by Wikipedia authors and score 16 BLEU points.

Given 200 examples like this:

A table:

First few lines of Wikipedia article text:

charles whibley ( 1859–1930 ) was an english literary journalist and author .

We could feed new tables, and generate sentences like this:

  1. rihards kuksi ( born 17 july 1988 in riga , latvia ) is a latvia scholar at a rihard kuksi university in slorida .
  2. chris cron ( born aug 1964 ) is a former american right-handed hitter .
  3. neil murphy ( born 19 may 1980 ) is a former english footballer who played as a central defender .
  4. hüinsey atalay ( born 27 october 1991 ) is a turkish footballer currently playing for antalya

The style mirrors that of Wikipedia writers and surfaces interesting behaviors. For example, the model knew that because the birthplace of Chris Cron was boston, the man was american.

We tried to extend this work to the finance domain, but unable to find structured data and summaries for finance, like that from Wikipedia, required us to build it ourselves. We took publicly available New York Stock Exchange data, aggregated and analyzed it using PySpark, extracted key metrics, and wrote templatized summaries using the realisation engine, SimpleNLG.

We plugged this training data into our model, but were surprised to find it learned the template structured in less than 100 epochs. So, we tried a small hack to add more variety. We used the Datamuse API to find synonyms of words before we created sentences and randomly picked 2–5 sentences from 10–15 we generated.

Financial Data Example:

Structured Data:


That is an average decrement of $2.17 comparative to past month! Not good.Apple had the shortest stock price with $105.44.And the average this period of time is more or less $111.13, give or take 19.24.The awaited average stock cost for Apple in the expected month is $108.96!.

After the 50th epoch…

Apple had the top stock value with $115.57.Apple Averages were 111 this quarter..Interested to know more?. Here is some Averages slue to empathize!.Apple Averages were 111 this quarter..

…and the 700th epoch:

That is an average decline of $1.67 relative to past month! That is an average diminish of $1.03 compared to past month! Not good. Apple had the least stock price with $105.44.

As you can see, our hack of using synonyms got in the way and highlights why having clean training data is important. To improve these results we can use Amazon Mechanical Turk to build a small minimum viable corpus instead of using templates, trying it on other datasets we encountered like restaurant reviews and weather summaries, and swap out the 117M param GPT-2 model we retrained with the more robust 1.5B param model one, which has much stronger benchmarks, see below.

The creators of GPT-2, cautioned saying these “large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true.” However, training them on structured data and teaching them a copy-mechanism has reduced this risk considerably and opens a wide range of applications.