Source: Deep Learning on Medium

# Solving Differential Equations with Transformers

In this article, I will cover a new Neural Network approach to solving 1st and 2nd order Ordinary Differential Equations, introduced in Guillaume Lample and François Charton (Facebook AI Research)’s ICLR 2020 spotlight paper, “Deep Learning for Symbolic Mathematics”¹. This paper tackles symbolic computation tasks of integration and solving 1st & 2nd order ODEs with a seq2seq Transformer, we will focus on the latter today.

To give context to this paper, although Neural Network methods have seen great success in clearly structured statistical pattern recognition tasks — e.g. object detection (Computer Vision), speech recognition, semantic analysis (Natural Language Processing) — symbolic reasoning is not one of its strengths.

Not only does Symbolic Computation require AI to infer complex mathematical rules, they also require a flexible, contextual understanding of abstract mathematical symbols in relation to each other. At the time of authoring, Computer Algebra Systems (CAS) (such as Matlab, Mathematica) held state-of-the-art performance on symbolic mathematics tasks, driven by a backend of complex algorithms such as the 100-page long Risch algorithm for indefinite integration.

However, these semi-algorithms are far from perfect: failing in specific cases and sometimes timing out indefinitely. In their paper, Lample and Charton develop a seq2seq approach in the attempt to outperform Computer Algebra Systems. Their contributions can be summarised as follows:

**Demonstrating the potential of seq2seq transformers in 3 symbolic mathematics tasks and by extension in symbolic reasoning****Achieving state-of-the-art performance in these tasks (with non-trivial dataset & inference time constraints²)****Introducing a novel approach to generate arbitrarily large datasets of expressions & corresponding solutions, where each expression is uniformly sampled from the specified problem space.**

Before performing inference, extensive pre-processing is done in areas of task structuring & definition, and dataset generation. Lample and Charton make 2 insightful observations — firstly, that specific cases in function integration can be significantly simplified by pattern recognition; secondly, that formal mathematics can be described using natural language prefix syntax (also in previous research³). The authors first parse mathematical expressions into tree structures, subsequently represent trees as sequences, then investigate the size of their problem space, and finally propose methods for dataset generation. Although the authors’ methods of investigating the problem space and of data generation are extremely interesting, for the purposes of this article, I will focus on sections “Expressions as Trees”, “Trees as Sequences” and “Experiments”. I will write a follow-up article on the whole paper soon!

**Expressions as Trees**

Tree data structures are inherently hierarchical and can reflect important features of mathematical expressions. By considering operators and functions as internal nodes of trees; operands as nodes; constants and variables as leaves, the authors achieve 3 things: encode order of operations information, describe associativity of operators, simplification by eliminating the need for parentheses. To illustrate this method, let’s parse the expression *a*²+2*ab+b*²: