Deep Learning & Healthcare: All the Glitters Ain’t Gold

Source: Artificial Intelligence on Medium

The Two Sides of Deep Learning

Why everyone loves Deep Learning?

Contrary to traditional Machine Learning (ML) algorithms, Deep Learning is fueled by massive amounts of data and requires high-end machines with powerful GPUs to run within a reasonable timeframe. Both of these requirements are expensive, so why companies and research labs thinks the juice worth the squeeze?

In traditional Machine learning techniques, most of the applied features need to be identified by a domain expert in order to reduce the complexity of the data and make patterns more visible to learning algorithms to work. The biggest advantage Deep Learning algorithms is that they try to learn high-level features from data by themselves. This theoretically eliminates the need of domain expertise and hard core feature extraction.

An example of Pattern Classification in Radiology. Contrary to traditional ML algorithms, DL frameworks encapsulate the trivial step of feature extraction in the learning process itself.

In complex problems where a high level of automation is required but there is lack of domain understanding for feature engineering (e.g. Image Classification, Natural Language Processing, Robotics) Deep Learning techniques are skyrocketing reaching never-seen-before levels of accuracy.

Deep Learning Applications in Healthcare

By processing large amounts of data from various sources like radiographic images, genomic data and electronic health records, Deep Learning can help physicians analyze information and detect multiple conditions, trying to address many needed healthcare concerns like reducing the rate of misdiagnosis and predicting the outcome of procedures. Here there are some well-known medical areas where Deep Learning is currently showing off:

  • Tumors detection: the adoption of Convolutional Neural Networks (CNNs) significantly improves the early detection of cancer, reaching very high accuracies in problems like breast cancer detection on screening mammography [Shen et al. 2019]. In this field DL algorithms are approaching — or even surpassing — the accuracy of human diagnosticians when identifying important features in diagnostic imaging studies [Haenssle et al. 2018].
  • Hospital readmissions, length of stay and inpatient mortality forecasting: DL-powered algorithms has access to tens of thousands of predictors for each patient, including free-text notes, and automatically identifies which data are important for a particular prediction without hand-selection of variables deemed important by an expert [Rajkomar et al. 2018].
  • Drug discovery and precision medicine: the discovery of a new drug is always surrounded with excitement from the academic community and the general public, but the drug development cycle is very slow and expensive, and less than 10% make it to market. DL can be used to automatically produce fingerprints and more effective features or for de novo drug design, reducing the cost of the process [Xu et al. 2018].
  • Natural Language Processing: the introduction of EHRs in medical centers all around the world unlocked a new source of information to leverage for healthcare providers: free text. Extracting useful actionable information from unstructured data helps in many aspects of healthcare, like summarization, automated reporting, question answering and — of course — decision making. However, clinical text is often fragmented, ungrammatical, telegraphic and make heavy use of acronyms and abbreviations, which can stump even the smartest NLP algorithms.

However, as the title says, even in Deep Learning not all that glitters is gold. Data scientists keep putting a lot of effort in all of the aforementioned applications, but some of them expose limitations that seem too hard to overcome in real use-case scenarios, forcing DL-based methods to stay relegated in the research-only quarantine zone. But why these methods excel in some areas and struggle in others? Why do we strive to achieve significant progress in real use-case scenarios?

Here we will focus on two of the most important conceptual issues of Deep Learning in Healthcare.

Deep Learning & Healthcare Issues: Interpretability and Causality

Dealing with Healthcare means dealing with people’s life. This implies carefulness, confidence, transparency, caution, sensibility and the ability to explain why and how we end up with a certain diagnosis. In the same way we expect to find these qualities in physicians and surgeons, we should seek for them also in our algorithms. And here is were Deep Learning shows its limits.

Let’s take Natural Language Processing, for instance. Nowadays, the amount of human-generated written/spoken data in healthcare is massive. It’s been estimated that nearly 80% of the healthcare data remains unstructured and untapped after it is created. Clinical notes are probably the most ignored input in healthcare, and this happens not because they’re not informative but just because it is hard to handle this type of data.
We already mention how leveraging this kind of information can lead to solid improvements in models’ accuracy, but is performance all that matters?

A very popular way to process text for prediction in healthcare is to use word embeddings, a multi-dimensional, dense, numeric representation of words powered by various types of Neural Networks.
Patient’s clinical notes can be then combined in different ways retrieving document embeddings or patient embeddings. Since we’re dealing with numeric vectors instead of text now, we can simply feed them into out favourite classification algorithm.

Example of Patient Embeddings pipeline: clinical notes’ sentences S for patient P1 are collected and broken down in words W. Deep Learning is then used to convert these words W in numeric vectors w, which are then combined back twice to retrieve patient embeddings p, which are vectors as well. This embeddings can become now part of the input of every classification algorithm we want.

Let’s suppose now we developed a model for the early detection of cancer that sees an astonishing 30% accuracy boost when we include unstructured data through 300-dimensional patient embeddings. We reasonably guess then that our model is using some very relevant information within the clinical notes to assess the patient’s condition.

But what is this relevant information?

As we said, embeddings are just dense numeric vectors. Converting words into vectors using Deep Learning and merging them together completely shuffles the cards on the table, making impossible to guess the combinations of words/sentences responsible for the patient’s classification.
Attention mechanisms and coefficients can only tell us what are the most relevant components for predicting the outcome, but since we lost the connection between words and components of the embeddings we cannot really understand why these components are so important.

Accuracy vs Interpretability trade-off (© Data Science Ninja)

Even if the model reaches a better accuracy, we can’t diagnose a desease just because component #217 says so. Using dimensionality reduction is fine, but we must be able to revert the process if we need to understand the origin of the decision taken and evaluate it within a bigger (human) knowledge framework.
Which leads us to the causality issue.

Yoshua Bengio, AI pioneer and Turing Award winner for contributions to the development of Deep Learning, recently states that DL has to learn more about cause and effect in order to achieve its full potential.
In other words, he says, Deep Learning needs to start asking why things happen.

Deep Learning is fundamentally blind to cause and effect. Unlike real doctors, deep learning algorithms cannot explain why a particular image or clinical note may suggest disease. Understanding causation, especially in healthcare, would make existing AI systems smarter and more efficient: a robot that interiorize causal realtionships can formulate hypothesis, imagine scenarios and learn how to address them before they actually happen.

This mindset is peculiar to the human nature, and cognitive science experiments have shown that understanding causality is fundamental to human development and intelligence, although it’s still unclear how we form this kind of knowledge. Deep Learning algorithms, on the other hand, aren’t very good at generalizing and fails in applying what they learn from one context to another. This is very limiting in medicine where every diagnostic and prognostic task is based on a very complex network of causal connections.