RECOMB 2019 Day One Highlights: Digital Twins and Deep Structured Phenotype Networks

Source: Deep Learning on Medium

Go to the profile of Benjamin Lee

It’s a great coincidence that this year’s Research in Computational Molecular Biology (RECOMB) conference was held just across the Potomac from In-Q-Tel’s offices at George Washington University in Washington, D.C. Today was the first day of the main track (there were a few satellite conferences over the past several days) and it was a day packed full of interesting talks on topics ranging from statistical methods to sequencing to supercomputing.

Here are some of my highlights:

The day opened with a keynote from Alfonso Valencia from the Barcelona Supercomputing Center. If you haven’t heard it, read Origin by Dan Brown: the book’s main action takes place in the chapel-turned-supercomputer. Prof. Valencia’s talk was about analyzing comorbidities (when multiple diseases are present at the same time, such as having lung cancer and Chronic Obstructive Pulmonary Disease caused by smoking) found in publicly available data, such as gene expression data. I learned something really interesting: diseases can be anticorrelated, where having one disease lowers your chances of getting a different one, such as lung cancer and schizophrenia (although that link is far from proven). Understanding why diseases do and don’t show up together can help explain their underlying mechanisms. However, finding these correlations (and anticorrelations) is no simple task, which is where the supercomputing comes in. The part of the talk that most excited me was about the concept of simulated “digital twins” for medicine based off of a person’s data. With a digital twin, the effect of treatments can be simulated to guide real-world treatment decisions in a continuous feedback loop. It’s going to take a lot of work to get there, but the combination of supercomputing, cheaper data collection, and better analysis has made this idea feasible.

The other highlight of my day was a talk by Mark Gerstein on the molecular basis of psychiatric diseases (slides available here). This talk was a summary of this paper and included a line that caught my eye:

We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

Their network, which they call a Deep Structured Phenotype Network (DSPN) is a modified deep Boltzmann network that works to predict phenotypes (observable traits) from genotypes (genetic data) by modeling the process by which DNA becomes traits. This is an exciting development because it’s interpretable and it performs better than the baseline models they tested.

I can’t wait for tomorrow’s talks on text mining entitled “Identifying clinical terms in free-text notes using ontology-guided machine learning” and “RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature”. Stay tuned!

Lab41 is a Silicon Valley challenge lab where experts from the U.S. Intelligence Community (IC), academia, industry, and In-Q-Tel come together to gain a better understanding of how to work with — and ultimately use — big data.

Learn more at and follow us on Twitter: @_lab41