Can We Trust AI?

Source: Deep Learning on Medium

If you are a technically oriented person like me, you will probably have gotten the impression that there are two dominant upcoming technologies: blockchain and AI. Since both of these technologies will one day necessarily target non-technical users, an important question to ask ourselves is if we can trust them to do what they were designed to do. For AI in particular, I feel that it is even harder to answer this question since technologies like deep learning are so hard to grasp even for experts in that domain. What’s more, the behavior of an AI system is strongly influenced by the data it has been trained on and any inherent bias in the data will be reflected in the deployed models. What is even more striking is that a lot of these systems will potentially be used in literal death-or-life situations such as self-driving cars, medicine or even military equipment. Therefore, the question of trust is inevitable and central to any further efforts to spread the use of AI algorithms. As a research scientist and engineer working on the development of medical devices and AI systems, I necessarily have to take part in this discussion, not least because doctors are loathe to adopt any automated algorithms they do not understand and therefore cannot check.

In this post I would like to highlight some reasons for bad interpretability of AI systems as well as potential sources of unsafe or unexpected AI behavior. Furthermore, I will describe some recent developments in the field of deep neural network inspection and ways to make deep learning models more understandable from the human perspective.

Why Is AI So Hard To Understand For Us Humans?

In recent years, there is a prevalence of so-called end-to-end neural network models designed to solve various machine learning problems. The advantage of these models is that they may be learned and applied without need for extensive feature engineering, allowing the network to “find its own way” of solving a potentially very complicated pattern matching problem. However, this also means that the solution will potentially be very complex and intransparent. It may be impossible for a human to guess the output of such a network given a certain input. What’s more, most of the time it is not even possible to say that similar inputs lead to similar outputs (i.e. the learned function is not continuous w.r.t. the inputs in mathematical terms). This is due to the fact that the functions that neural networks learn have a very large number of dimensions and their shape is extremely complex.

We humans like systems that are continuous in some sense. If we deal with, say, Gaussian process regression in a Bayesian optimization setting, we explicitly formulate the problem in such a way that the solution will be continuous. This reflects our prior understanding of the problem. For neural networks, it is hard to incorporate such prior knowledge into the model.

How to incorporate prior knowledge into a deep neural network?

One way of incorporating prior knowledge into a deep neural network model is to impose a prior on the individual weights of the network; this is known as weight decay in the deep learning community [1]. Weight decay punishes very large weights in the network, preventing it from focusing on a particular feature too much and thus avoiding overfitting.

We can also convey our prior knowledge through the network architecture or even through the data themselves. The latter may be realized e.g. in the form of data augmentation. In data augmentation, we feed the network with some transformation of the input data in order to make it invariant to that transformation. Imagine, for instance, the good ol’ dogs-or-cats classification problem. For this problem, we may randomly rotate or mirror the images to make the network aware that these transformations do not make a difference for the classification.

Unfortunately, none of these possibilities is principled in such a way that we could explicitly tell the model how to behave. In other words, we do not impose priors on the learned input-output function directly. A direct way would consist of some mathematical constraint on the resulting function or calculating the full posterior distribution from the prior and the likelihood, as it is done in bayesian networks. This would furthermore allow us to estimate the confidence that the network has in a certain prediction, considering the shape of the output distribution. However, in general it is impossible to calculate the posterior distribution for deep neural network models exactly due to the huge amount of computation it would require for large networks.

Why Does AI Behave Unexpectedly Or Even Dangerously?

In general, we could argue that the unexpectedness of neural networks lies in their inconsistencies in dealing with similar inputs. The fact that we are just able to calculate a point-estimate of the output in terms of maximal likelihood instead of a full distribution (giving us confidence values) is another difficulty.

Deep neural networks are particularly susceptible to so-called adversarial attacks. These attacks consist of inputs to the network which are specifically designed to fool it. This is nicely demonstrated for computer vision in [2] (Figure 1). In order to fool the network, the images on the left (which were classified correctly) were perturbed by specifically designed noise (middle column). The resulting images (in the right column) were classified as “ostrich”.

Figure 1: For the images in the left column, AlexNet returns the correct class. For the images in the right column, the network fails spectacularly. (source: Szegedy, Christian, et al. “Intriguing properties of neural networks.” arXiv preprint arXiv:1312.6199 (2013).)

What the authors present here is a common case of overfitting. Overfitting and input-output inconsistency are two very similar terms.

Another issue we have to deal with when talking about the dangers of neural networks is inherent bias in machine learning models in general. Bias is generally introduced to the model through the selection of training data. If these data are themselves biased, this leads to a biased model. There are quite obvious cases for bias in neural networks. Take, e.g. the ImageNet database. This famous database has been acquired by collecting images found on the internet. Thus, the database inherits the biases of the internet source and every model trained on the database inherits these biases as well. This is nicely shown in [3]. In this paper, the authors demonstrate that models trained on ImageNet, such as ResNet-101, when looking at images in the basketball class, prefer images with black persons holding a basketball instead of white persons. Furthermore, the class ping-pong ball preferred images displaying Asians dressed in red (Figure 2).

Figure 2: Pairs of internet images along with their predictions by ResNet-101. (source: Stock, Pierre, and Moustapha Cisse. “Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism.” arXiv preprint arXiv:1711.11443 (2017).)

How To Inspect Deep Learning Systems And Make AI More Transparent?

Multiple possibilities for deep neural network inspection have been studied over the few years of deep learning development. A very simple possibility is to visualize activations of individual layers of the network. However, for individual inputs, this does not tell us much, apart from knowing which neurons correspond to that particular input. What we would prefer instead is to know what is the optimal image of a particular class (say, a prototype dog or cat) for the network. This is an optimization problem. In particular, we perform gradient ascent on a randomly initialized image until we maximize the class score with respect to the space of input images. This is essentially the same procedure as neural network training, but here we are keeping the network weights fixed and optimize the input instead. An example of this can be found in [4] (see also Figure 3).

Figure 3: Visualizations for various layers of a convolutional neural network for specific inputs. (source: Yosinski, Jason, et al. “Understanding neural networks through deep visualization.” arXiv preprint arXiv:1506.06579 (2015).)

In a similar way, we can also visualize spatial class saliency for individual inputs [5]. This method allows us to see on which parts of the input the network is focused on when performing classification (see Figure 4). In detail, it evaluates the gradient of the image space with respect to the class score at the target input image.

Figure 4: This visualization lets us see which parts of the image are the most important to the network when classifying the image as a dog. (source: Simonyan, Karen, Andrea Vedaldi, and Andrew Zisserman. “Deep inside convolutional networks: Visualising image classification models and saliency maps.” arXiv preprint arXiv:1312.6034 (2013).)

Although the previously mentioned methods have recently been superseded by better and more general methods such as Grad-CAM [6] (at least for CNNs), the basic idea of these types of algorithms remains the same.

Another, more involved possibility is to study the training process itself. In [7], the authors try to look at training from an information science perspective. One of the main results is that the training proceeds in 2 distinct phases: the fitting phase and the compression phase. During the fitting phase, the network weights adjust so as to fit the labels. During the second and much longer compression phase, the network tries to find the most compressed interpretation of the input that fits the labels (a so-called minimal sufficient statistic). It is this second phase that is largely responsible for overfitting, due to the fact that for small sample sizes, the calculated compression is not sufficient for the input-output distribution. In Figure 5, we can see that for small sample sizes (left side), the information content w.r.t the inputs and labels actually decreases during the second phase, whereas larger training datasets do not suffer from this weakness (right side).

Figure 5: Overfitting in terms of information. (source: Shwartz-Ziv, Ravid, and Naftali Tishby. “Opening the black box of deep neural networks via information.” arXiv preprint arXiv:1703.00810 (2017).)

Unfortunately, in real-life, we still cannot predict overfitting and input-output inconsistencies reliably. However, the paper gives a nice explanation of why techniques such as early stopping tend to work well.

What’s next?

A lot of research in neural network safety and stability lies in front of us. In the last years, a lot of brainpower went into improving the accuracy of neural networks on various datasets. In computer vision, for instance, we are fast approaching human-level performance on large datasets such as ImageNet. Since this performance will necessarily plateau at some point, I think it makes sense to shift our attention to how well these networks perform in real life. Studying the training process of neural networks will give us additional insight into the process of overfitting. Another large research focus will be detecting and preventing bias in training datasets and machine learning models. The importance of this cannot be overstated since machine learning is increasingly used in areas where decisions may have dramatic consequences, such as civilian safety, economy or medicine.

In the meantime, I would suggest taking a lot of care while selecting training and testing datasets for your own models (if you have control over them). In my opinion, dataset collection is the most crucial process in neural network training and if done correctly, the model will perform a lot better in the real world. Additionally, it is advisable to consistently monitor your deployed models to check their performance and make adjustments if necessary, especially if you detect some consistent bias.