Uber Open Sources Manifold to Visually Debug Machine Learning Programs

Source: Deep Learning on Medium

Uber Open Sources Manifold to Visually Debug Machine Learning Programs

The new tool will help data scientists to accelerate the detect performance issues with models and datasets.

Uber continues its amazing contributions to the machine learning open source community. From probabilistic programming languages like Pyro to low-code machine learning model tools like Ludwig, the transportation giant has been regularly releasing tools and frameworks that streamline the lifecycle of machine learning applications. Just yesterday, Uber announced that it was open sourcing Manifold, a model-agnostic visual debugging tool for machine learning models. The goal of Manifold is to help data scientists identify performance issues across datasets and models in a visually intuitive way. Uber first discussed Manifold in a blog post early last year and immediate received a lot of requests about open sourcing the stack.

Machine learning programs defer from traditional software applications in the sense that their structure is constantly changing and evolving as the model builds more knowledge. As a result, debugging and interpreting machine learning models is one of the most challenging aspects of real world artificial intelligence(AI) solutions. Debugging, interpretation and diagnosis are active areas of focus of organizations building machine learning solutions at scale. The challenge of debugging and interpreting machine learning models is nothing new and the industry has produced several tools and frameworks in this area. However, most of the existing stacks focus on evaluating a candidate model using performance metrics such as like log loss, area under curve (AUC), and mean absolute error (MAE) which, although useful, offer little insight in terms of the underlying reasons of the model’s performance. Another common challenge is that most machine learning debugging tools are constrained to a specific types of models(ex: regression or classification) and are very difficult to generalize across broader machine learning architectures. Consequently, data scientists spend tremendous amounts of time trying different model configurations until they can achieve specific performances.

Entering Manifold

A company like Uber is operating hundreds of machine learning models across dozens of teams. As a result, debugging and interpretability of those models becomes a key aspect of the machine learning pipeline. With Manifold, the Uber engineering team wanted to accomplish some very tangible goals:

· Debug code errors in a machine learning model.

· Understand strengths and weaknesses of one model both in isolation and in comparison, with other models.

· Compare and ensemble different models.

· Incorporate insights gathered through inspection and performance analysis into model iterations.

To accomplish those goals, Manifold segments the machine learning analysis process into three main phases: Inspection, Explanation and Refinement.

· Inspection: In the first part of the analysis process, the user designs a model and attempts to investigate and compare the model outcome with other existing ones. During this phase, the user compares typical performance metrics, such as accuracy, precision/recall, and receiver operating characteristic curve (ROC), to have coarse-grained information of whether the new model outperforms the existing ones.

· Explanation: This phase of the analysis process attempts to explain the different hypotheses formulated in the previous phase. This phase relies on comparative analysis to explain some of the symptoms of the specific models.

· Refinement: In this phase, the user attempts to verify the explanations generated from the previous phase through encoding the knowledge extracted from the explanation into the model and testing the performance.

The three steps of the machine learning analysis process materializes on a simple user interface that streamlines the debugging of machine learning models. The Manifold user interface consists of two main dialogs:

1) Performance Comparison View: Provides a visual comparison between model pairs using a small multiple design, and a local feature interpreter view.

2) Feature Attribution View: Reveals a feature-wise comparison between user defined subsets and provides a similarity measure of feature distributions.