The Multi-Modal Approach For ML

Source: Deep Learning on Medium


Go to the profile of Danilo Pena
Photo by Andy Hall on Unsplash

As data scientists and machine learning practitioners, we get to deal with interesting problems through a data-driven lens. As data scientists, at least for me, this lens can get pretty one-track minded — to the point where sometimes it feels as though we aren’t leveraging the right data to solve our current project.

I conduct machine learning research in neurodegenerative diseases such as Alzheimer’s and Parkinson’s disease using brain imaging, and we typically use one imaging modality (AKA one type or form of imaging).

There are many forms of imaging modalities, from magnetic resonance imaging (MRI) to CT scans to diffusion tensor imaging (DTI) to functional MRI. The list goes on, and it will continue to grow as these imaging technologies advance to allow doctors to see things like iron deposits or amyloid plaques or dopamine transporters within the brain — which could all be potential biomarkers for better understanding these diseases.

In addition to these imaging modalities, there are other types of healthcare data that I could potentially use — electronic health record data (a mix of structured and unstructured), clinical diagnosis through specific codes (more structured), and genetic data (structured but wildly high-dimensional).

As you can imagine, there are many types of data that are becoming more popular that I personally don’t have access to — wearable data, social network data, socioeconomic data, etc.

This wealth of data is both mind-boggling, somewhat anxiety-inducing, but definitely is an interesting problem set for our kind.


There are many researchers out there leveraging these unique types of data, including our group. If we start with imaging, we know that deep learning has made great strides in computer vision. If we look at clinical notes, deep learning has definitely improved through natural language processing.

But has our community made strides in truly understanding what it means to have a multi-modal approach? Sure, we can chuck in each type of data separately through their own network, concatenate the output, and run it through a dense layer for a classification prediction.

Just experiment and see what happens is typically the mindset now.

But does this do the data justice? I don’t think so. I also don’t think we will get to a better understanding of how these data are interrelated or correlated or how they cluster around groups depending on different variables.

We need better methods to better extract the useful, actionable information when we combine uniquely important types of data together to ensure that we are driving insight analysis rather than solving the next problem in front of us.

We need to understand how different regions of the brain that are changing correlate with different genetic hotspots. Or why a person who uses a type of slang or language is more likely to progress further into depression. We need to understand how the social determinants of health on a granular level affect the larger, complex system that is our brain and vice versa.

We need a true multi-modal approach in machine learning, especially if we are going to use these tools to alter our lives.

Thanks for reading.