Source: Deep Learning on Medium

In most cases the black-box paradigm is a true observation, but in this article I wish to show that it doesn’t necessarily follow that deep-learning or complex models are black boxes by some intrinsic nature. Instead, I deconstruct their black-boxiness down to a problem of reduction. I then provide an account for how the non-linear complexity of the model motivate a non-reductive picture. Last, I assert that a generalizing (deep-learning) model has the potential to answer at least one **explanandum: ***a fact or thing to be explained, *that goes beyond the sum of its parts, and thus no model is entirely a black-box. I then talk about why I think this is important.

We begin with a thought experiment to help build some initial intuition. Consider that a trained model is fully deterministic: a certain input will always produce a certain output. Let’s say Charlie has built a model that makes stunningly good predictions, for example whether someone will develop cancer in 10 years based on a full body MRI (such a capability would change medicine, we’re upping the stakes here). Charlie’s model has 50 million parameters and, “amazingly”, the exact value for every single one of these parameters are known and the exact way they interact is also known. Well of course — this is typical of building a deep-learning model. Now, a doctor uses Charlie’s model and wants them to explain why it said their patient will get cancer in 10 years. Charlie says, “Exactly because plugging that data into a series of non-linear functions with these exact 50 million parameters yields that prediction.” And Charlie feels perfectly correct. They can give you a flashdrive that supposedly has all the information needed to explain how their model works.

But that sucks as an answer, and the important question is why: We have a sense that the explanation isn’t good enough, or isn’t talking about the right things. It is here we encounter the black-box contradiction: that even though we have the exact structure of the model on hand, we also have a sense that we don’t know anything about how the model works. Where does this come from?

The central disconnect is the doctor is thinking in terms of biologically or medically relevant features which we implicitly assume exist in the MRI data that could be predictive of cancer. She is really asking you to produce an explanation of the network in terms of these features. The word for how Charlie’s describes or views their model is reductionist: that everything the model does can be reduced to its parts. There’s no “I-think-this-grey-pixel-blob-is-a-sign-of-impending-cancer” node that fires. So really, when we say black-box, what we are really saying is that a neural network only provides explanation as far as its parts, and since its parts have no relationship with our desired explanation aside from its functional success, we say, “data in, black-box middle, prediction out”.

Is this it? No, there is a very important property of neural networks that indicate we can take safely take a *non-reductive *view. Since we also assume and observed that the models generalize to new data, (and since Charlie’s super model got FDA approval it must have), we know there must exist at least one feature that goes beyond the reduced view of the model. The argument goes like this:

(1) a model **M** with structure **{W}** (it’s weights, connections, etc.) was created by some process to predict **Y** from *n* examples of input data **X **in a set **{X}**.

(2) the model reliably predicts **Y’** from never seen input data **X’** .

(3) since **{X’}** is disjoint from **{X}**, there exists a reliable feature neither fully represented in **{X} **nor **{X’}. **(4)

**from (2,3) there is at-least one reliable feature represented by**

**M**that is not represented in

**{X}**nor

**{X’}**

(5) from (1,4) there exists a property of

**M**which is not reducible to the the structure

**{W}**.

**A model which generalizes to new data is not reducible to its definition.**

To truly stand by this, we must provide an account for how certain non-reductive behaviors emerges from the model in terms of features that are not immediately apparent in the model. It is important to understand, then, that the black-boxiness of a model is not a property of the architecture, but of our expectation of how that architecture should work or how it came to be. This is a sneaky shift, but one that frees us to perhaps open a door to having models that feel “explainable”. So lets try:

Our starting clue is that neural networks only work because of their non-linearities, which is another way to say that operations do not commute in a neural network. A simple example is that applying a matrix operation (i.e. a linear layer) followed by the ReLU activation will not yield the same output as applying ReLU followed by a matrix operation. I show this in the below figure which was inspired by Montufar *et. al. *In their paper *On the Number of Linear Regions of Deep Neural Networks *they find and assert that “Deep networks are able to sequentially map portions of each layer’s input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs.”

A useful idea that emerges from this is the idea of “folding” higher dimensional space onto itself. Intrigued by this, I explored how a non-linearity changes the output space, and indeed find that even a simple 2-node network compute a function that reacts equally to different inputs. But there is another feature: a ‘crease’ where this folding does and does not happen. But further, I find that the boundary or crease in the input space is non-perpendicular to the input dimensions **only for the non-linear combination of inputs**. This crease is a hyperplane within the input space, where on one side the output varies proportionally to its inputs, and on the other-side it becomes squashed to an equal output.

Since the location of this plane is found via optimization, we can then say that a linear layer solves for an optimal boundary where its inputs matter and where they don’t, for every combination of inputs*. *This folding and crease effect goes beyond the sum of the inputs and forms an explanatory unit: this boundary exists because there is some region of the feature space which explain the outcome, because there is also a region where they don’t. For a model that generalizes well, say Charlie’s super-FDA approved model, to new patients, it follows that there must be a feature in his model that is not in the training data but that exists in the model. It means that Charlie’s model is has at least one explanatory unit: and if there’s an explanatory unit, there’s an explanandum, or something to be explained. **This means that, there is always something to be learned from a neural network that generalizes.**

So let’s take this back to the beginning. We find that a usual approach to the black-box problem begs the question of machine learning models being reductive. I then showed that there exists at least one feature that is not fully in the training data if the model generalizes (which is the goal of any model). I then motivate that a deep-learning model can produce, through it’s non-linearities, features that must be non-reducible to some pattern of inputs. These feature at least partially explain *something, *and since there is something explainable by a model, the model is not purely opaque.

Jeez. Okay. Maybe all of this was a super complicated and verbose way of saying “They are just too complex to understand”. But I disagree: I think this formulation, if correct, can provide inspiration to search for these small **explanatory units **or think of ways to create easier to find ones.

In my work, we predict tumor grade from whole-slide tissue images using a deep-attention network. A key feature of this is the so-called attention map, the distribution of the attention weights over the tissue. We find that for this specific classification, high weights cluster at the tumor boundary, where the differentiation of tumor cells is most apparent and extreme. This is our higher-level explanatory unit, and we are happy to have one. It’s uncanny to see this feature *emerge *from the initial maps.

What if you just have a straight MLP with 10 hidden layers and no attention map to observe? It’s just millions of parameters and just works! I don’t have answer for you right now, but maybe if you become convinced that there is some new feature learned by your network, something that goes beyond anything within the training data, you’ll also find inspiration to at least look for it — I personally think you can, and that you should. If your model is generalizing to novel data, it’s found combinations of inputs that either matter or don’t, it’s not only extracted signal from the noise, but there are new signals in how these signals are extracted.

Now, clearly, this doesn’t mean it’s easy to see huge networks as translucent and explain everything about them. That’s the hard part, but I have a sense of optimism that we will get better at and find new ways to understand these powerful models we build.