Interpretations in Learning — Part 1

Source: Deep Learning on Medium

Introduction

Artificial Intelligence (AI) is becoming ubiquitous in both science and industry due to deep learning’s superiority in very specific tasks such as image classification¹. While the success of deep learning should be celebrated, we must be aware of and reconcile the following two facts:

  1. Science aims to explain reality, and
  2. Deep learning struggles to explain itself

Considering this juxtaposition, can we trust deep learning as a scientific tool to explain reality, and in turn exploit these discoveries within industry? Arguably no, as we are blind-sided by hidden risks and explanations that deep learning cannot provide, especially when making predictions beyond training examples.

Being able to interpret a prediction, and then explain the problem, is fundamental for the general performance of AI and absolutely crucial for safety, reliability, and fairness². Thus, in order to advance AI we must also advance interpretations of learning.

In this blog I provide an intuitive primer on deep learning and suggest that uncertainty is the first level of interpretability we should address before reasoning about significance and causality (Figure 1).

Figure 1: The canonical structure of quantitative interpretations. All models are equipped with predictive capabilities, the foundation of uncertainty, which in turn informs significance and then causality. With causality, perhaps the ultimate interpretive power, we could answer counterfactual “what if” and “why” type questions, and begin to reason far away from training examples (something statistics and machine learning are thus far incapable of)³.

A primer in deep learning and uncertainty

Most simply, deep neural networks are flexible functions, f, that map some high-dimensional input, x, to some target, y = f(x). This map from x to y is often referred to as a task, e.g., reading a genomic signature (x) and predicting whether a patient has cancer (y = 1) or not (y = 0). Neural networks learn tasks from experience (i.e., data) almost automatically, hence why deep learning models are becoming so widespread.

One limitation with the interpretability of traditional neural networks (NNs) is that they are often deterministic. This means that given a single input x, a deterministic NN will predict the same output y = f(x) with each repeated forward propagation (see Figure 2 below). There is no variation in output, which limits the networks ability to simulate reality’s inherent randomness. Furthermore, estimates of uncertainty from a deterministic neural network will be overconfident, when compared to a probabilistic alternative. Hence, for the sake of safety, robustness, and reliability of interpretations, we must employ some probabilistic method; namely, Bayesian neural networks (BNNs).

Figure 2: A feedforward neural network. Each black dot represents a vector of parameters (or neuron). A layer of neurons will transform data (input from the left) and forward propagate activations forward to the next layer. The final output layer yields the prediction.

Bayesian neural networks can extend traditional neural networks by randomising the model’s parameters. [Note: There is a specific theoretical motivation to justify Bayesian neural networks, stemming from Bayes’ Rule, but this is esoteric and unnecessary for most people who are not interested in statistics. We can ignore this for now and just get the point across for why we should care about uncertainty in deep learning.]

Each time a BNN forward propagates input x, a different realisation of the random parameters, ø, will transform the information to yield a different output f(x,ø) with each repetition. This process is repeated enough times so that we can obtain a distribution of predictions p(f(x,ø)) (Figure 3).

Figure 3: Producing a distribution of outputs p(f(x,ø)) with Monte Carlo Dropout. Each simulation has a different weight configuration, whereby black coloured nodes indicate active neurons, and grey coloured nodes indicate “dropped out” neurons. The parameters in dropped out neurons are all set to zero. The idea is that multiple forward passes, each with different weight configurations, yield a distribution of outputs p(f(x,ø)). This, in turn, provides a mechanism to inspect the model’s uncertainty (i.e., variance or entropy).

The variance of the p(f(x,ø)) distribution defines a notion of model uncertainty about the neural network. Model uncertainty is one of two distinct components of uncertainty that is provided by a BNN, and reduces if we have more training examples or a more suitable model architecture.

The other component of uncertainty that we can model is data uncertainty, which indicates the inherent randomness of the data itself. (See Figures 4a and 4b below for a pictorial example of the difference between model and data uncertainty.)

These uncertainties provide us with the prerequisite equipment required to increase the interpretive power to reason about significance and causality (recall Figure 1 above). Furthermore, uncertainty itself enables useful interpretations pertinent to reliability, predictive performance, optimisation, explainability and risk management.

Figure 4a: Data uncertainty in pink and model uncertainty in green when trained on a few data examples (magenta points).
Figure 4b: Data uncertainty in pink and model uncertainty in green when trained on many data examples (magenta points).

Figures 4a and and 4b above illustrate how as the BNN is given more examples to train on, the uncertainty decreases. These figures also touch on something that I will expand on in the next blog article in this series. That is, with BNNs, the total uncertainty can be deconstructed into two components: data uncertainty (risk) and model uncertainty.

Risk-management: an application of uncertainty

Many applications become available when we have good estimates of uncertainty (both about the model and data). Here, I’ll introduce one such application, which should be enough to get you started. Part 2 of this blog series will provide a much more comprehensive and technical explanation of the different kinds of uncertainties, in addition to example code that performs exactly what is demonstrated below.

Many tasks involve an amount of risk that needs to be managed, be it for safety, scientific integrity, or financial profit. To illustrate this, consider the case study depicted in Figure 5 below.

If a driverless vehicle made inference on an image (Figure 5a) to predict what object different segments of that image belonged to and falsely predicted portions of a path as belonging to a road (Figure 5c), the car may drive off the road and compromise safety. This action could be avoided by rejecting predictions that have uncertainties above some suitable threshold (Figure 5e). Thus, the model uncertainty can improve the predictive performance and safety of the AI in production.