Deep Learning at Barclays with MXNet

Until early March 2018, I was a data scientist in the Advanced Data Analytics team at Barclays. The research and ideas I discuss here were developed at Barclays and some of the content here first appeared on the Barclays internal wiki.

MXNet is a “flexible and efficient library for deep learning”.


In the testing we did at Barclays, the runtime of training Keras models using TensorFlow as a back end was massively slower than training with the MXNet backend. Additionally, the MXNet backend used much fewer resources than the TensorFlow backend.


MXNet is a C++ library which has bindings for Python, R, and Scala. One can define and train a model using the Python bindings, and then export the model object to a file which can be read by a Scala program. The model can then be used to make predictions for a massive dataset. This is a really awesome feature.

Making predictions on a massive dataset is useful, but not nearly as powerful as training MXNet models in a distributed setting on YARN clusters, with or without the help of Apache Spark. With Spark, MXNet starts a parameters server process within the driver and on each executor, which allows the training of models on massive datasets across a pre-existing cluster. To my knowledge this is the only deep learning library to offer this level of language and context flexibility.

MXNet can be used with a CPU backend, or with up to 256 GPUs on a single machine, or as part of a cluster, as long as the GPUs support CUDA.

Why not just use Keras?

MXNet does work as a backend in Keras, however I advocate using MXNet directly. MXNet only works with the dmlc fork of Keras which is markedly different from the main Keras project. This makes it challenging to find community support and to have new features of Keras or MXNet available. Additionally, in model development at Barclays we found that we quite quickly got the point where we needed access to the finer points of the model, which is much more straightforward when using MXNet direcly.

Isn’t TensorFlow the clear option for scalable Deep Learning?


TensorFlow models are slow, due in part to the computational model of not in-lining matrix operations:

TensorFlow is riding on a wave of hype, however it scales poorly and requires a massive buy-in: if you aren’t prepared to run a dedicated TensorFlow cluster, forget it. It isn’t portable when compared to MXNet: one cannot simply export a model from TensorFlow in Python and load it into an Akka cluster to score it on a stream of data coming from a web server in a type-safe scalable manner.

Further reading:

Source: Deep Learning on Medium