Interpret Neural Networks Through Dynamical Systems

Original article was published by Chirath Hettiarachchi on Deep Learning on Medium


Interpret Neural Networks Through Dynamical Systems

Photo by Uriel Santillan Carrion on Unsplash

Neural Networks & Deep Learning are extremely popular topics of discussion among computer science enthusiasts and general public alike. Where many possess a very basic understanding on how these models are driving Artificial Intelligence through a learning process. Simply put, these techniques are able to learn intelligent tasks when we provide data (e.g. ability to classify between cat’s vs dogs given labelled data, current systems are much more intelligent than this!!!).

Neural Networks (or Artificial Neural Networks) were developed, inspired by the biological neurons in the brain. These man-made networks are build-up of simple processing units called artificial neurons which works in parallel and has the ability to gain experiential knowledge through a learning process [1]. Different types of Neural Network architectures (Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Residual Networks…) have shown impressive breakthroughs in areas such as image classification, natural language processing, and speech recognition. However, they are still considered as “black box” models since we lack proper understanding and explanations on how these networks operate.

In this article we focus on using Dynamical Systems in order to interpret the behaviour of Neural Networks and try to demystify them. We explored dynamical systems and their characteristics in our previous articles.

Summarising the previous articles, a Dynamical system is a system where the states evolve with time, based on a fixed rule, over a state space. Example: The motion of a simple pendulum. The position, velocity (states) would vary based on some physical law with time (visit the previous articles to understand these concepts and vector fields related to dynamical systems).

Recent research has used dynamical systems as a framework to explain the underlying concepts of neural networks. In the paper “A proposal on machine learning via dynamical systems” by Weinan, E. (2017) [2] this idea was introduced. This article focuses on summarising the main ideas, the reader is directed towards recent publications for rigorous mathematical proof.

One main aspect of Machine Learning is to identify a mathematical model (function) which can represent a given set of data based on some desired accuracy. For an example in the following graph the task is to identify a model which can represent temperature data. The ability to identify such a mathematical model (function) helps us understand the data and to predict future data.

Image Credit: https://github.com/eriklindernoren/ML-From-Scratch, Copyright © 2017 Erik Linder-Norén, MIT License

In classical approximation theory this model estimation is carried out starting from simple functions, these simple functions were combined (through addition & multiplication) together to construct more complex functions to fit the target data (e.g. addition of multiple sine waves in Fourier series to approximate a signal). Different basis functions, wavelets, splines have been used in this approach and the main limitation can be identified as the curse of dimensionality (i.e. issues arising in high dimensional spaces, and not in low dimensions).

Neural networks are more complicated models which try to learn a more complex mathematical function. The main difference in neural networks is the use of “compositions of functions”, compared to the addition and multiplication of functions in classical approximation theory. This aspect is summarised below. Each layer of the neural network can be identified as a function which is fed into another function in the next layer (f1(f2(f3(x)))). We explore dynamical systems to understand these compositions.

Image by Author

Recall from the previous articles that continuous time dynamical systems can be represented by the following equation below. These continuous equations are solved using different numerical approximation methods. The corresponding discrete dynamical system can be obtained through the Forward Euler Approximation. An example vector field from the previous article shown below, represents how sample trajectories vary with time over the state space (i.e. the dynamical nature).

Image by Author

In neural networks, at each layer there is a linear transformation (weight*x + bias) followed by a component wise non-linear activation (e.x. sigmoid, ReLU). There are multiple such layers in a network combined through compositions. This can be viewed as a discrete dynamical system where at each layer there is some kind of a transformation which converts one space to another. Compared to the above graph where a state (point in space) is transformed to another state (point in space) according to a fixed rule, here you can consider it as a flow map of different spaces. This idea is further illustrated in the figure below, where each layer represents a vector space, which is transformed to another vector space, similar to the nature of a dynamical system.

Image by Author

Let’s try to understand this idea further through an example. In image classification, the CNN’s learn different features at each layer. Starting from simple pixels first they learn to identify edges, next they try to identify parts of an object, finally they learn to identify the final image. These different features are identified at different layers of the network. Hence, this can be understood as a flow map, where each layer represents a different feature space. Along the layers these feature spaces are transformed and finally we obtain our desired result.

To establish this idea mathematically we can think of a Residual Network. A Residual Network is similar to general networks, but they possess skipped connection from one layer to another. This can be identified by the equation and diagram below, which represents a residual block. This equation at a given layer of the network is the same as the discrete dynamical system equation which we identified earlier. Hence, Residual Networks can be interpreted as a discrete dynamical system.

Image by Author

But how can non-residual networks be interpreted through dynamical systems?

Yang et al in their ICML 2020 paper “Interpolation between Residual and Non-Residual Networks”, explores this aspect [3]. They propose a continuous dynamical system with damping and establishes the relationship between dynamical systems and CNN’s.

There are many other interesting relationships and aspect which can be analysed when neural networks are viewed as dynamical systems. How can different network architectures / structural diversity be understood? How can stability of neural networks be analysed? Recent research has explored these aspects, where network architectures are analysed through different numerical approximation methods (In this article we identified that the Forward Euler approximation related to simple residual networks). Neural networks experience issues such as vanishing / exploding gradients, whereas in dynamical systems also chaotic behaviours, dynamic explosions can be seen. Through the stability analysis of dynamical systems, we can explore these issues. These aspects will be discussed in the next article. I hope this article provided you with the basic idea on understanding neural networks through dynamical systems.

Image by Author

References

[1] Chapter 1, Neural Networks & Learning Machines, by Simon Haykin.

[2] Weinan, E. (2017). A proposal on machine learning via dynamical systems.

[3] Yang et al ICML 2020, Interpolation between Residual and Non-Residual Networks.