Agile ML: Some Things I Learned About Rapid Experimentation in Real World Machine Learning Projects

Source: Deep Learning on Medium


Experimentation is a key element of machine learning solutions. Differently from most software development disciplines, machine learning relies on experimentation-validation rather than on empiric tests to assess the behavior of a specific model. Given a use case, there could be numerous machine learning methods that can produce viable solutions and the ultimate selection must be based on experimenting and comparing different alternatives. While experimentation is conceptually simple, it can become a nightmare in large machine learning solutions.

The reason that building a solid architecture for experimentation is so challenging is that is often considered an after-the-fact element in machine learning solutions. When building a simple machine learning scenario that includes a single model, data scientists will conduct a handful of experiments in their dev environment and settle on a model that is eventually productionalized. That approach is wrong for many reasons including the fact that any given machine learning problem can have numerous solutions that perform differently against specific datasets. Let’s increase the complexity of our basic scenario by considering an environment in which a data scientist is able to produce several experiments that can be executed and evaluated over time in order to eventually select a production workflow. Now, let’s imagine this type of experimentation environment being used by dozens of data scientists across different teams. Are you getting the picture? Without the right infrastructure, experimentation can become the biggest bottleneck in large scale machine learning solutions.

Experimentation is the key element to achieving agility in machine learning solutions. A streamlined experimentation architecture will allow data scientists to develop, test and evaluate different machine learning models for a specific scenario and capture knowledge element that can be reused in future models. Unfortunately, most machine learning stacks are not designed for large scale experimentation which forces data science teams to regularly build proprietary infrastructures to enable experimentation processes that work well in their organization. After facing that challenge a few times at Invector Labs, we have identified a series of ideas that might be relevant to other data science teams looking to build agile experimentation in their machine learning solutions. Here are some of my favorites:

Experimentation and Production Might Use Different Machine Learning Frameworks

Most organizations dream of standardizing on a single machine learning framework for all their workflows. In our experience, that idea has proven impractical in large organizations in which different data scientists are likely to use different frameworks to conduct their experiments.

What makes a good machine learning framework is different than the capabilities that you want in a production runtime. Simplicity and a rich portfolio of algorithms are key elements for rapid experimentation while GPU-level scalability, concurrent messaging and performance are important for a production framework. A framework like PyTorch is very efficient for modeling experiments but still immature for production compared to alternatives like TensorFlow and Caffe2. In order to enable agility, a machine learning experimentation infrastructure should support models written in different frameworks which can eventually be translated into production models.

ONNX can be Useful

When I first read about the Open Network Exchange Format(ONNX), I didn’t think too much about it beyond being a nice effort to avoid irreparable fragmentations in the machine learning space. Well, it turns out that ONNX can become extremely useful to enable experimentation across different machine learning frameworks.

A pattern that we have used several times and that has proven efficient in organizations like Facebook is to use ONNX as an intermediate representation between experimentation and production workflows. In that approach, the experimentation models are translated into an ONNX model which can be converted into the native runtime framework.

Experimentation-Ready Notebooks Templates

Interactive notebooks are the preferred experimentation tool for data scientists. However, Notebooks have a few issues that make it challenging for running experimentation workflows. For starters, code included in Notebooks tend to be incredibly messy containing all sorts of training and testing logic. Also, Notebooks are not easy to parametrized or run on scheduled basics. Fortunately, there are some good open source libraries that address many of these challenges.

In order to enable rapid experimentation of code authored in Notebooks, consider using a template that standardizes the inputs and output formats of a model as well as other configuration parameters. Libraries such as Papermill can be incredibly helpful in this effort.

A Notebook-Based Runtime

Once we have experimentation models in a consistent Notebook format, the next step is to automate the execution of those models. Experimentation workflows can run for hours or even days before producing meaningful results. Most machine learning production environments rely on containers to encapsulate models in an executable, portable and scalable format. However, containers are not a great vehicle for machine learning experimentation.

A Notebook Server is a great vehicle to enable scheduled execution and rapid testing of machine learning experimentation workflows. Technologies such as Commuter provide a solid foundation for executing and sharing Notebooks at scale and can be easily adapted as a machine learning experimentation runtime.

Workflow Tracking

The key element of machine learning experimentation is to compare results of different machine learning models. In order to do that, models need to be able to track important metrics during runtime. A tracking service should be a core component of a machine learning experimentation infrastructure. This service should record the outputs of a model as well as other relevant metrics such as execution time, loss functions, parameters, etc. While you can build a tracking service from scratch, frameworks like MLFlow already include similar components that can be incorporated into your machine learning experimentation architecture.

Metric Visualization

Complementing the tracking service, a machine learning experimentation runtime needs to be able to visualize and compare the inputs, metrics and outputs of different machine learning models. While machine learning workflows can be arbitrarily complex, the metrics for specific architectures such as linear regression or decision trees as well-documented. In that sense, it is relatively easy to visualize the relevant metrics to a model and compare the results. Facebook’s FBLearner Flow is a great reference architecture for this type of approach.

Versioning Versioning Versioning

Establishing a versioning strategy for machine learning models is essential for successful experimentation. Keeping track of different versions of a model as well as its historical performance will allow data scientists to understand the evolution of experiments and adapt rapidly to new requirements.

Leverage ML Cloud Runtimes if Possible

Experimentation is one of those areas in which cloud machine learning platforms can make a difference. The toolsets of platforms such as AWS SageMaker, Google Cloud ML or Azure ML is far more sophisticated than on-premise equivalents. Given that experimentation workflows are not subjected to the same regulatory, requirements than production pipelines, it is possible to leverage cloud runtimes for experimentations and eventually move the workflows into on-premise production runtimes.

Rapid experimentation is essential to achieve agility in large machine learning environments. While experimentation if often ignored when designing machine learning architectures, it can become one of the biggest bottlenecks at certain scale. The ideas listed in this article don’t necessarily apply to every single experimentation environment but can serve as a guidance to build more agile experimentation in your machine learning solutions.