Source: Deep Learning on Medium
Why&How: Interpretable ML
Interpretable Machine Learning (or interpretable AI) techniques got a lot of attention recently as an attempt to open the black box of modern predictive algorithms (mostly neural nets). And it is not just in academia, policymakers and businesses as well have realized that interpretability is a key to warding off potential dangers arising from the notoriously instable ML models being deployed in businesses, public health, criminal justice, and so on. For example, the US military is dedicated to develop what they call explainable AI systems (XAI), the EU General Data Protection Regulation (GDPR) of 2016 contains the “right to explanation” of algorithmic decisions, and the Equal Credit Opportunity Act of the US asserts the right to specific reasons for denied credit. Interpretable ML has also spawned numerous start-ups, such as InterpretableAI, Clarifai, and Aignostics, to name a few.
The goal of this post is to give you an overview over interpretable ML, why it is so useful, where you might benefit from it, and to give you some initial pointers to methods that are out there. It will not cover details about any specific techniques in this post.
So what is interpretable ML?
In short, interpretable ML means that the decision your algorithm makes can somehow be translated into a domain that is human-understandable. For example, If the algorithm classifies images of cats, then one approach would be to simply highlight which pixels have been used by the algorithm to arrive at a classification decision.
If you want to be precise, you should distinguish between interpretations, and explanations. Interpretations cast the predicted class into a human-understandable domain, e.g. images, text, or rules. Explanations are simply a collection of input features responsible for model output, this can be anything you feed into your algorithm. More often than not, you’ll find that these terms are being used interchangeably.
What are the benefits of interpretable ML?
Besides the abovementioned regulatory requirements, interpretable ML is useful in many, not mutually exclusive, scenarios:
- Building trust: When safety-critical decisions have to be made, e.g. in medical applications, it is important to provide explanations so that the domain expert involved can understand how the model came to its decisions and thus can decide whether to trust the model or not. (Here a paper that considers trust.)
- Failure Analysis: Other applications, such as autonomous driving might not involve an expert when deployed. But if something goes wrong, interpretable methods can help to retrospectively inspect where bad decisions have been made and understand how to improve the system.
- Discovery: Imagine you have an algorithm that can accurately detect early-stage cancer, and on top of that, also prescribe the optimal treatment. Being able to use this as a black-box is great already, but it would be even better if experts could inspect why the algorithm does so well and subsequently gain insights into the mechanisms of cancer and the efficacy of treatments.
- Verification: When training ML models, it is often hard to tell how robust the model is (even if the test error is great) and why it does well in some cases but not in others. Especially heinous are so-called spurious correlations: features that correlate with the class you want to predict in the training data, but are not the true underlying reason of why this class is correct. There are great examples of spurious correlations and also cases documented in academia.
- Model Improvement: If your model does not well, and you don’t know why, it is sometimes possible to take a look at explanations of the model decision and identify whether the problem lies in the data or the model structure.
It is important to note that there are many different approaches to interpretable ML, and not every approach fits every problem.
Rule-based systems, e.g. the expert systems successful in the 70s and 80s, are generally interpretable as they typically represent decisions as a series of simple if-then rules. Here, the whole, possibly complicated, decision process can be traced. A major disadvantage is that they generally depend on manually defined rules, or on preparing the data in the form of arguments from which rules can be induced (i.e. in symbolic form).
Decision trees also follow the if-then schema, but they can be used out-of-the-box on many data types. Unfortunately, even they can be incomprehensible if the decision is complex, and trees are certainly not the most powerful tool at our disposal. If your problem is not too complex and you want the model to be innately interpretable and easy to communicate then they’re a great choice.
Linear models (e.g. linear regression) are great because they immediately give you an explanation in the form of weights associated with each input variable. Due to their nature, linear models can only capture simple relationships, and for high-dimensional input spaces, the explanation might not be understandable.
If your problem is quite simple or fits the expert system setting, then one of the methods above should suffice. If you want explanations for more powerful state-of-the-art models (e.g. deep neural nets or kernel methods), keep reading.
Some models have built-in interpretability. For example, neural attention architectures learn weights on the inputs that can directly be seen as an explanation. Disentangling models, such as beta-VAE, can also be seen as explanation producing models since they provide us with meaningful factors of variation in the data.
In contrast to the approaches above, post-hoc interpretability methods deal with providing interpretations after the model has already been trained. This means that you don’t have to change your model or training pipeline or retrain existing models to introduce interpretability. Some of these methods have the great advantage of being model-independent, meaning that you can apply them to any previously trained model. This means you can also easily compare explanations delivered by different models.
One of the most well-known post-hoc interpretability techniques are Local Interpretable Model-agnostic Explanations (LIME). The basic idea is that to locally explain the algorithm’s decision for a specific input, a linear model is learned to emulate the algorithm for only a small region around the input. This linear model can by nature be interpreted and tells us how the output would change if we change any input feature by a little.
SHapley Additive exPlanations (SHAP) is an approach that builds on Shapley analysis, which is essentially about judging the importance of features by training the model on a number of subsets of all available features and evaluating what effect the omission of features has. In the SHAP paper, connections to LIME and DeepLift are made as well.
Two other closely related branches of interpretability methods are propagation-based, and gradient-based approaches. They either propagate back the algorithm’s decision back through the model or make use of the sensitivity information provided by gradients of the loss. Prominent representatives are Deconvolution Networks, Guided Backpropagation, Grad-CAM, Integrated Gradients, and Layer-Wise Relevance Propagation (LRP).
Many of these approaches have been designed for (convolutional) neural networks only. A notable exception is LRP, which has been applied to e.g. kernel methods and LSTMs as well. In 2017, LRP has received additional theoretical grounding and an extension, the Deep Taylor Decomposition. Some propagation-/gradient-based methods are neatly implemented and ready to use in this toolbox.
I tried to capture the most important techniques, but there are certainly many more, and the number is increasing with every relevant conference.
To wrap it up, here are some points to take away:
- Interpretable ML plays an increasingly important role and is already often a (regulatory) requirement.
- It can be helpful in many scenarios, e.g. to build trust with the user or to get a better understanding of the data and the model.
- There is a wealth of methods out there, from very simple tools that have a long tradition (rule-based systems, or linear regression), to techniques that can be used with modern models, such as neural networks.
I hope you learned something useful. If you have any comments or feedback, let me know!