A Review of Different Interpretation Methods in Deep Learning (Part 2: Input × Gradient, Layerwise…

Source: Deep Learning on Medium

A Review of Different Interpretation Methods in Deep Learning (Part 2: Input × Gradient, Layerwise Relevance Propagation, DeepLIFT, LIME)[In progress…]

Welcome to the second article of the series of “A Review of Different Interpretation Methods in Deep Learning”. As its name suggests, this series aims to introduce you to some of the most frequently used interpretation (explanation) methods in deep learning. As a brief introduction, interpretation methods could help understand why a deep neural network predicts what it predicts and whether the high accuracy of the predictions of a model is meaningful.
Before proceeding any further, I highly recommend those of you who have not read the first article of this series to go through it (which is available here), as some of the methods covered here build upon the concepts introduced in the first article. In this post, I will cover another three important explanation methods, which include Layerwise Relevance Propagation (LRC), Local Interpretable Model-agnostic Explanations (LIME), and Input × Gradient.

Now let’s go through the details of these methods (as I already covered three of them in the previous article, the numbers here start from 4!):

4. Input × Gradient

5. Layerwise Relevance Propagation (LRC)

Layer-wise Relevance Propagation (LRC) aims to explain the predictions of a neural network by introducing a set of constraints and solving them. Any solutions to the constraints will be considered an acceptable explanation for the predictions of the network. In LRC, each dimension (d) of every layer (l) of the network has a relevance score (R) and the following equation should hold for the scores:

  1. Sum of relevance scores is constant across the different layers of the network:



6. Deep Learning Important FeaTures (DeepLIFT)

Learning Important Features Through Propagating Activation Differences

explains the difference in output from some reference output in terms of the difference of the input from some reference input.

Summation-to-delta property. Δxᵢ


but what is the reference value for a neuron? it is just its activation on the reference input.

propagating an importance signal even when the gradient is zero

seperate considerations for positive and negative contributions at non-linearities

giving scores using a single back-propagation

6. Local Interpretable Model-agnostic Explanations (LIME)

LIME is an algorithm proposed by the paper “Why Should I Trust You?” Explaining the Predictions of Any Classifier. This algorithm explains the predictions of any classifier or regressor by approximating it with an interpretable linear model. As the term “Any Classifier” in its name suggests, LIME is a model-agnostic method which makes it in some sense unique as there are not many global methods.
The first thing to note here is that interpretable explanation could be achieved by using binary vectors indicating the presence or absence of an interpretable feature (e.g. bag of words for texts datasets and superpixels in image datasets) even though the classifier may use more complex features

define explanation as a model g ∈ G where G is the class of potentially interpretable models. note that g is represented by a binary vector i.e. g acts over interpretable components. Ω(g) is a measure of complexity π(z) — for which the exponential kernel is a good choice- as a proximity measure between an instance z to x, so as to define locality around x. Therefore the objective is the sum of how unfaithful g is in approximating f (locality aware loss) and complexity of g. approximating it by drawing samples and weight them by proximity measure

sample instances around xʹ by perturbing it and get zʹ and then recover the original sample z and then obtain f(z)

leverages local linearity of neural networks

Proposed by the paper Axiomatic Attribution for Deep Networks in ICML 2017.

numerically obtaining high-quality integrals adds computational overhead.

Takes an axiomatic approach and proposes two fundamental axioms which are Sensitivity and Implementation Invariance that attribution methods should address

using a baseline for which the prediction is neutral

  1. Sensitivity: If for every input and baseline that differ in one feature but have different predictions then the differing feature should be given a non-zero attribution. DeepLift and LRP tackle the sensitivity issue by employing a baseline and in some sense try to compute discrete gradients instead of gradients at the input
  2. Implementation Invariance: Attribution methods should satisfy Implemen- tation Invariance, i.e., the attributions are always identical for two functionally equivalent networks (i.e. the outputs of networks match for all inputs)

Combines implementation invariance along with the sensitivity techniques like LRP or DeepLIFT

Considers the straight line from the baseline xʹ to the input x and compute the gradients at all points along the path. Integrated gradients are obtained by cumulating these gradients.

Satisfies some properties mentioned in the paper (Implementation invariance, Sensitivity, Linearity, Completeness, Symmetry-preserving)

needs the network to be called on the inputs along the path 20 to 300 times

The integrated gradients along the i-th dimension for an input x and baseline is:


Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors

A great feature is that one can conduct hypothesis testing on any concept on the fly that makes sense to the user

Adding Noise

SmoothGrad: removing noise by adding noise

Neuron Conductance