Deep Reinforcement Learning (DeepRL) for Abstractive Text Summarization made easy (Tutorial 9)

Source: Deep Learning on Medium

2- Why use it with Deep Learning ?

Recently there have been an approach of combining the reinforcement learning approaches with deep learning in what is called DeepRL, this actually turned to be extremely beneficial and successful especially in the field of NLP.

This came from the very fact that the way that deep networks try and optimize the NLP tasks is actually different from the way we tend to measure the accuracy of the output (as mentioned by yaserkl, his amazing repo and their paper and also this paper by Paulus et al)

They have pointed out that

  • we tend to use maximum likelihood estimation (MLE) metric to optimize the loss function in our deep architecture, as in any other task that we need to use deep learning to solve.
  • However we tend to use other metrics, specific for NLP, like BLEU and ROUGE metrics for evaluating the output

BLEU and ROUGE metrics are used in NLP tasks to measure the overlap between the words from the reference and from the output (i.e: the number of words that are seen in both the testing reference and the output sentence), as the overlap increases the score increase.

So we are actually trying to optimize something different from what we tend to evaluate the output with, this creates an inconstancy between training and testing metrics.

The real reason however for the deep learning methods to not be able to optimize the BLEU and ROUGE metrics directly comes to the fact that the deep architectures can only optimize differentiable metrics, while BLEU and ROUGE are non-differentiable metrics, they can’t simply be expressed in an equation to be integrated into a deep architecture.

Here comes the use of reinforcement learning, using its power to optimize non-differentiable metrics, a deep architecture can use it to optimize the BLEU and ROUGE algorithms.

In the next tutorial if GOD wills it, we would go into details on a Reinforcement learning algorithm that has been used for addressing the Abstractive Text Summarization task, it is called Policy Gradient, the code for this implementation can be found here.

I truly hope this tutorial has given you an insight on the worlds of Deep Learning and Reinforcement Learning, and hope to see you in next tutorials.