Getting to the Heart of it: How Deep Learning is Transforming Cardiac Imaging

Written by Suvadip Paul and Jessica Wetstone

Cardiac “self-image” issues. From Heart and Brain, a comic series by Nick Seluk. Source

An estimated 17.7 million people died from CVDs [cardiovascular diseases] in 2015, representing 31% of all global deaths. Of these deaths, an estimated 7.4 million were due to coronary heart disease and 6.7 million were due to stroke — [1]

Cardiovascular diseases are the number one cause of death globally. For those affected, early detection is critical for both management and treatment. One of the leading diagnostic tools in this area is cardiac imaging — including magnetic resonance (MR), ultrasound, and computed tomography (CT). Cardiac imaging can assess both heart anatomy and function and aid in the detection of various heart-related pathologies, such as coronary artery disease, cardiac masses, and congenital heart disease [2]. Automated approaches to processing these images are in high demand to alleviate the burden on radiologists, hoping to improve both diagnostic accuracy and efficiency.

Radiologists today face an ever-increasing amount of medical images to review. In a sense, the medical imaging community is a victim of its own success; this explosion in images is primarily due to improvements in medical imaging technologies, which have meant both more images per individual scan and a greater demand for scans as a diagnostic tool. In a study at the Mayo Clinic comparing radiology workloads between 1999 and 2010, they found a 1300% increase in the number of CT images interpreted and 540% increase in the number of MR images — primarily driven by the average number of images per scan, which for CTs grew from 82 images in 1999 to 679 images in 2010 [3].

One of the most active areas of research in applying deep learning to cardiac imaging is in segmentation: the task of identifying which pixels in a medical image correspond to the contour or interior of a particular region of interest, e.g. isolating the outline of a particular organ from an MRI. Not only can quantitative metrics be derived immediately from the size and volume of the segmented areas, but segmentation is also often an important pre-processing step ahead of pathology detection [4].

This post will review recent successes in the application of deep learning techniques to cardiac imaging segmentation tasks. In particular, we’ll aim to answer:

  • Why segmentation matters: What is one specific segmentation task within cardiac imaging, and why is it clinically important?
  • Enter deep learning: How does deep learning formulate the segmentation problem?
  • What has been achieved: What is the current performance of deep learning models on segmentation tasks? What recent advancements are the most promising?
  • What comes next: What future directions is this research likely to take?

Why segmentation matters

Let’s say your doctor is concerned that you might be suffering from heart disease. As a diagnostic tool to assess your cardiac function, she orders a cardiovascular MRI — a noninvasive test that uses a powerful magnetic field and radio waves to create cross-sectional images of your heart. A cine MRI (as in “cinema”) repeatedly captures images of the same heart slice over the course of several cardiac cycles before moving on to the next slice. Each individual image taken over time is called a “frame”.

Front view of the heart. Source

When a radiologist reviews the MRI to get a sense of your heart’s health, one of the quantitative metrics calculated is the ejection fraction (EF): the percentage of blood that is pumped out of the left ventricle during each heartbeat. A healthy human has an EF of around 55–70% [5]. In order to derive this metric, the radiologist must calculate the volume of the left ventricle (LV) from two time-points within the cardiac cycle: the end of the systole (when the heart muscle contracts) and the end of the diastole (when the heart refills with blood). EF is then calculated as (EDV-ESV)/EDV, where ESV is the end-systolic volume and EDV end-diastolic [6].

Manual contouring [of the ventricles] can take upwards of 30 minutes per case [7]

Finding the volume of the left ventricle begins with outlining its contour within each end diastole and end systole frame. This process is a segmentation task, typically completed manually by the radiologist (with the help of software packages like Neosoft). Manual contouring is time-consuming; it has been estimated that it can take upwards of 30 minutes per case. A fully-automated tool could alleviate this burden — completing the left ventricle segmentation task automatically and providing the results (including EF among many other indications) either to the radiologist and cardiologist for continued processing, or directly to a computer-aided diagnostic package.

The segmented left ventricle from a cine MRI series, and the output of one of the models discussed below. Red: the endocardium; green: the epicardium. [9]

Segmentation tasks in cardiac imaging are neither limited to the left ventricle, nor to MR scans. Other modalities for cardiac imaging include ultrasound and CT scans; Like MR scans, ultrasounds are used to evaluate cardiac anatomy and function, whereas CT scans are mainly used for coronary artery and aortic evaluations. Segmentation tasks in these other modalities include coronary centerline extraction (isolating the main branch of the coronary artery) in CT scans and aortic valve segmentation in ultrasounds.

Enter deep learning

Deep learning — and in particular Convolutional Neural Networks (CNNs) — has recently emerged as the dominant approach in computer vision. The year 2016 saw the rise of fast real-time detection and classification on images (YOLO), driven by both developments in CNNs and more powerful GPUs. Due to deep learning’s success in general computer vision, it’s unsurprising that recent research in cardiac imaging applies variations on the CNN architecture to the segmentation task.

Here is how a typical approach is formulated:

Inputs: The input to the model is the MR/US/CT scan that contains the region of interest to be segmented. The dimensionality of this input can vary; the simplest models work on individual 2D images (e.g. a single MR slice), while others take as input a stack of 2D images or even a 3D volume.

Example CNN for both LV and RV segmentation [9]

Sample architectures: The network depicted above combines the tasks of localization (finding the LV within the image) and segmentation by scanning the entire image for pixels that correspond to the ROI. Other systems approach this as a two-step problem (see image below): a first CNN recognizes the region of interest and adds a “bounding” box around it, and then a second CNN performs the segmentation task — restricting its search for the LV to within the boundaries of the box identified by the first network.

The output of the detection (localization) task is used as input to the segmentation task [14]

Outputs: Scans are annotated with either an outline or mask of the region of interest. During supervised training of the network, these predicted output masks are then compared against ground truth: the same scans where contours have already been identified by human experts.

What has been achieved

Notable modifications to the standard convolutional network and training procedures that have been successful are:

  1. Utilizing information from adjacent MRI slices to capture inter-slice correlations and improve model accuracy and efficiency [8]
  2. Employing transfer learning to improve training efficiency [9]

How to measure success

One of the primary evaluation metrics for segmentation tasks is the Dice index, which is a measure of how well two contours overlap. The Dice index ranges from 0 (complete mismatch) to 1 (perfect match). In testing a supervised learning model, the contours produced by the model are compared to the expert-drawn contours. Dice indices as high as 0.96 (end diastole frames) and 0.94 (end systole frames) have been reported for the left ventricular endocardium segmentation task [10]— meaning that the contours predicted are very close to those of the human expert ground truth. The same study reports Dice indices of (0.94, 0.87) and (0.89, 0.90) for right ventricular endocardium and left ventricular epicardium segmentation tasks.

The Dice index measures the similarity between two contours (Source)

Utilizing information from adjacent MRI slices

For Poudel et al., 2016’s model focusing on the left ventricle segmentation task, the input to the network is the full stack of 6–12 adjacent slices from either the end systole or end diastole phases of the cardiac cycle. This differs from the majority of previous work in this area, where inputs and outputs are restricted to single 2D MRI slices.

The full stack of MRI slices (left) with left-ventricular masks (right) [8]

Poudel et al. implement a standard CNN architecture that operates on each slice independently; however the network is also augmented by a Gated Recurrent Unit (GRU), which allows the network to maintain memory of previous inputs. Instead of predicting each input image independently, the addition of the GRU means that the prediction for each slice is a function of both the current slice and all previous images from its stack. The authors call the resulting model an “RFCN” (Recurrent Fully Convolutional Network)[8].

Employing transfer learning

Tran (2016) use transfer learning to improve the performance of a network on the right ventricle segmentation task in cardiac MRIs. Problems with right ventricle function are associated with cardiac diseases such as pulmonary hypertension, cardiomyopathy and dysplasia [11]. However, RV segmentation has historically received less research attention than LV, due to a lack of publicly-available training data. Capitalizing on the similarity of the two tasks and the greater prevalence of LV training data, Tran (2016) first train a CNN on a left ventricle dataset to recognize left ventricle contours. They then use the LV network’s weights as a starting point to train a network to segment the right ventricle.

Comparing the performance of the network initialized in this way vs. a randomly-initialized network trained on the same RV dataset, the network initialized using the transferred weights performs better on both the Dice index and other evaluation metrics. These results suggest that transfer learning could be very promising for other anatomical application areas, taking advantage of organ symmetry as well as other similarities between tasks [9].

What comes next

Waiting for more data

Although a large number of cardiac scans are performed daily, these images are not typically made available to researchers. Unlike other fields where data is abundant and easily found publicly, this is not the case for medical imaging (primarily due to privacy restrictions). The most active area of research in cardiac imaging — ventricle segmentation — may be mostly due to public challenges that made labeled datasets available to researchers (MICCAI 2009, MICCAI 2012).

In other fields, the performance of deep models on image processing has been shown to work incredibly well — in the presence of enough data. Deep learning models rely on the complexities of their training data to extract higher level features. For example, radiologist-level detection of pneumonia has already been achieved [14], but this was with the benefit of a training dataset containing over 112,000 images. The more data, the more robust the model is to inter-observer variability while creating the ground truth labels. In the LV segmentation case, expert-drawn contours may differ across radiologists, so the model could end up learning individual radiologists’ biases. We expect most models in this area to improve once they have access to more data.

Beyond Segmentation

Segmentation is far from the only problem in cardiac imaging. Pathology detection and classification are other important problems in this space; one example is detecting the amount of calcium accumulated in the coronary arteries, a key indicator of coronary artery disease. A cardiac CT scan is a common way to measure the accumulation of calcium, known as “calcium-scoring”. For this application, deep learning’s ability to classify calcium build-up even in CT scans not typically used for this task [12, 13] means that patients can be exposed to less radiation while achieving the same diagnostic result.

Our hope is that in the future, more and more of the perceptual tasks of segmentation, detection, and classification can be automated — relieving radiologists from their medical image overload, and ultimately improving clinical outcomes for cardiac patients.

Acknowledgements

We would like to thank Matthew Lungren MD MPH, Assistant Professor of Radiology at the Stanford University Medical Center for his review and feedback. We also want to thank Pranav Rajpurkar, Jeremy Irvin, Norah Borus, Henrik Marklund, and Erik Jones from the Stanford ML Group for their comments.

References

[1] Cardiovascular diseases (CVDs), World Health Organization. http://www.who.int/mediacentre/factsheets/fs317/en/

[2] Cardiac MRI Indications, University of Virginia School of Medicine. https://www.med-ed.virginia.edu/courses/rad/cardiacmr/Indications/Indications.html

[3] McDonald R., Schwartz K., Eckel L., Diehn F., Hunt C., Bartholmai B., Erickson B., Kallmes, 2015. The Effects of Changes in Utilization and Technological Advancements of Cross-Sectional Imaging on Radiologist Workload. Acad Radiol, 22:1191–1198

[4] Litjens, G., Kooi, T., Ehteshami Bejnordi, B., Arindra Adiyoso Setio, A., Ciompi, F., Ghafoorian, M., van der Laak, J., van Ginneken, B., Sanchez, C., 2017. A Survey on Deep Learning in Medical Image Analysis. arXiv:1702.05747.

[5] Ejection Fraction Heart Failure Measurement. http://www.heart.org/HEARTORG/Conditions/HeartFailure/DiagnosingHeartFailure/Ejection-Fraction-Heart-Failure-Measurement_UCM_306339_Article.jsp#.WnjTuJM-dZh

[6] Simpson’s Rule for Measuring Volumes, University of Virginia School of Medicine. https://www.med-ed.virginia.edu/courses/rad/cardiacmr/Pathology/CAD/Simpson.html

[7] Lieman-Sifry, J., Le, M., Lau, F., Sall, S., Golden, D., 2017. FastVentricle: Cardiac Segmentation with ENet. arXiv:1704.04296.

[8] Poudel, R. P. K., Lamata, P., Montana, G., 2016. Recurrent fully con- volutional neural networks for multi-slice MRI cardiac segmenta- tion. arXiv:1608.03974.

[9] Tran, P. V., 2016. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv:1604.00494.

[10] Zotti, C., Luo, Z., Humbert, O., Lalande, A., Jodoin, P., 2017. GridNet with automatic shape prior registration for automatic MRI cardiac segmentation. arXiv: 1705.08943.

[11] Luo, Gongning & An, Ran & Wang, Kuanquan & Dong, Suyu & Zhang, Henggui., 2016. A Deep Learning Network for Right Ventricle Segmentation in Short:Axis MRI. 10.22489/CinC.2016.139–406.

[12] Wolterink, J. M., Leiner, T., de Vos, B. D., van Hamersvelt, R. W., Viergever, M. A., Isgum, I., 2016. Automatic coronary artery cal- cium scoring in cardiac CT angiography using paired convolu- tional neural networks. Med Image Anal 34, 123–136.

[13] Lessmann, N., Isgum, I., Setio, A. A., de Vos, B. D., Ciompi, F., de Jong, P. A., Oudkerk, M., Mali, W. P. T. M., Viergever, M. A., van Ginneken, B., 2016. Deep convolutional neural networks for automatic coronary calcium scoring in a screening study with low- dose chest CT. In: Medical Imaging. Vol. 9785 of Proceedings of the SPIE. pp. 978511–1 – 978511–6.

[14] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M., Ng, A., 2017. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:1711.05225

[15] Ghesu, F. C., Krubasik, E., Georgescu, B., Singh, V., Zheng, Y., Hornegger, J., Comaniciu, D., 2016b. Marginal space deep learning: Efficient architecture for volumetric image parsing. IEEE Trans Med Imaging 35, 1217–1228.


Getting to the Heart of it: How Deep Learning is Transforming Cardiac Imaging was originally published in Stanford AI for Healthcare on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Deep Learning on Medium