Automatic Text Summarization : Simplified

Source: Deep Learning on Medium


Take a peek into the world of Automatic Text Summarization

Why Text Summarization?

Judging a book by its cover is not the way to go.. but I guess a summary should do just fine.

In a world where internet is getting exploded with a hulking amount of data every day, being able to automatically summarize is an important challenge. Summaries of long documents, news articles, or even conversations can help us consume content faster and more efficiently. Automatic Text Summarization is a growing field in NLP and has been getting a lot of attention in the last few years.

inshorts : An innovative mobile app that converts news articles into 60 word summaries.

I will not be discussing specific details of any algorithm or implementation. This blog is for the curious few who would like to gain a deeper understanding of how these Text Summarization models work. Anyone with absolutely no previous experience in Deep Learning or NLP can superficially follow the blog. Even a rudimentary understanding of commonly used NLP models is enough to fully appreciate the details.

Types of Text Summarization

There are 2 types of text summarization methods, namely extractive and abstractive. Extractive summarization is essentially picking out sentences from the text that can best represent its summary. Extractive summarization techniques have been prevalent for quite some time now, owing to its origin in 1950s. It’s more about learning to understand the importance of each sentence and their relations with each other rather than trying to understand the content of the text.

Abstractive summarization, on the other hand, is all about trying to understand the content of the text and then providing a summary based on that, which may or may not have the same sentences as present in the original text. Abstractive summarization tries to create its own sentences and is definitely a step towards more human-like summaries.

So how exactly is it done then?

The techniques employed to do extractive and abstractive summarization are miles apart from each other. As mentioned earlier, extractive summarization is, crudely speaking, a sentence ranking problem while abstractive summarization involves more complex linguistic models as it generates new sentences.

I personally believe extractive summarization has run it’s course and now most of the research focus is towards abstractive summarization, which is actually a way more interesting problem (again.. just my opinion!!). So I won’t be talking about extractive summarization, but if you are still interested in reading about it, I would suggest this awesome blog.

In the last few years, since the arrival of Deep Learning, Abstractive Summarization, Interaction with machines through natural language and Machine Translation have all been getting a lot of success. I have mentioned Machine Translation and interaction here because of the parallelism it follows with Abstractive Summarization. All of these techniques encode an input sentence into features and then tries to generate a different sentence i.e. decode these features.

A commonly used Deep Learning based Machine Translation model is an LSTM based Encoder Decoder network with Attention. There have been various successful variations of this skeleton, each with their own pros and cons.

The model starts with an LSTM based Encoder which converts the sentence into a vector of features. The decoder, also made up of an LSTM, is responsible for creating the output, one word at a time. The decoder starts with the vector of features provided by the encoder and then each word is predicted based on the previous word prediction and LSTM output. Attention is placed on the encoder features to make them even more specific to the current word.

A detailed explanation of the working of an LSTM based Encoder Decoder network with Attention can be found in this blog.

Seems like it’s all figured out!!

Unfortunately, no!
Generating new sentences is a complex process that the machines have not mastered yet. An issue with Abstractive Summarization is also the length of sentences to be encoded. While LSTMs have the ability to be able to capture both long term and short term contexts, even they have a limit of what can be considered long term. This makes summarizing really long documents difficult.

Another astronomically important issue for summaries is that it should never contain facts that contradict the input text. Extractive summarization can never face this problem since they pick up sentences directly from the text. But abstractive summarization are prone to such factual incoherence.

For example, if an abstractive summarization model saw sentences like Germany 3–2 France, England 3–2 Portugal etc. while training then at testing time, it might predict Spain 3–2 Brazil even if the actual score in the input text is, say 1–2. That’s because 1–2 is not a part of the model’s vocabulary, but 3–2 is.

That sounds bad!!

Extractive summarization suffer with the lack of ability to create their own sentences while abstractive summarization fail against the complexity to create complete sentences on their own. A very creative way from the middle of these two extremes was recently proposed in a network called ‘Pointer Generator Network’.

In simpler terms, the authors created the network in such a way that it suggested two different probability distributions on what the next predicted word should be. The first one was based on the model’s vocabulary while the second one based on the vocabulary present in the input text. These two were then combined to get the final distribution. You can read about the model in further detail here.

What’s next?

If you look at some of the state-of-the-art results of abstractive summarization, you will find that they are doing a respectably good job. But they only work for certain type of documents and fail spectacularly for others. One of the biggest challenges at this point is to be able to create grammatically coherent sentences from encoded features, which is a core part of both abstractive summarization and machine translation.



References

[1] Luhn, Hans Peter. “The automatic creation of literature abstracts.” IBM Journal of research and development 2.2 (1958): 159–165.
[2] Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. “Effective approaches to attention-based neural machine translation.” arXiv preprint arXiv:1508.04025 (2015).
[3] See, Abigail, Peter J. Liu, and Christopher D. Manning. “Get to the point: Summarization with pointer-generator networks.” arXiv preprint arXiv:1704.04368 (2017).