Deep Learning with Magnetic Resonance and Computed Tomography Images

Source: Deep Learning on Medium


Getting started with applying deep learning to magnetic resonance (MR) or computed tomography (CT) images is not straightforward; finding appropriate data sets, preprocessing the data, and creating the data loader structures necessary to do the work is a pain to figure out. In this post I hope to alleviate some of that pain for newcomers. To do so, I’ll link to several freely-available datasets, review some common/necessary preprocessing techniques specific to MR and CT, and show how to use the fastai library to load (structural) MR images and train a deep neural network for a synthesis task.

Overview of MR and CT images

Before we get into the meat and bones of this post, it will be useful to do a quick overview of the medical images that we’ll be talking about and some idiosyncrasies of the types of images in discussion. I’ll only be talking about structural MR images and (to a lesser degree) computed tomography (CT) images. Both of these types of imaging modalities are used to view the structure of the tissue; this is opposed to functional MR images (fMRI) or positron emission tomography (PET) scans which image blood flow activity and metabolic activity, respectively.

For people not acquainted with medical images at all, note that medical image statistics are different from natural image statistics. For example, a mammography image looks nothing like any picture that a human would take with their smart phone. This is obvious of course; however, I think it is important to have this in mind when designing networks and working with the data to make some sort of machine learning (ML) algorithm. That is not to say that using common networks or transfer learning from domains outside medical imaging won’t work; it is only to say that knowing the characteristics of common issues regarding medical imaging will help you debug your algorithm. I’ll discuss specific examples of these characteristics in the preprocessing section below and show ways to reduce the impact of some of these unique problems.

I’m not going to go into much detail about the intricacies of structural MR imaging. A good place to start for more in-depth details of MR is this website, which goes into depth regarding any topic that an ML practitioner working with MR would care about. I’ll note that there are many different types of MR images that an MR scanner can produce. For instance, there are T1-weighted, T2-weighted, PD-weighted, FLuid-Attenuated Inversion Recovery (FLAIR), among others. To make things more complicated, there are sub-types of those types of images (e.g., T1-weighted images come in the flavors: MPRAGE, SPGR, etc.). Depending on your task, this information may be extremely useful because of the unique characteristics of each of these types and sub-types of images. The reason for all these different types of images is because MR scanners are flexible machines that can be programmed to collect different information according to different pulse sequences. The upshot is that all of these images are not just redundant information; they contain useful and unique information regarding clinical markers that radiologists (or us as image processors) care about. Again, I’ll discuss more details regarding unique aspects of MR in the preprocessing section.

While there is contrast and non-contrast CT, there are not as many varieties of images that can be created with a CT scanner. Vaguely, the CT scanner shoots high-energy photons through you whose energy is calculated via a detector on the other side of your body which the photons hit. When images like this are taken from a variety of angles, we can use our knowledge of the geometry at which the images were acquired to reconstruct the image into a 3D volume. The physical representation of the energy lets us map the found intensity values to a standard scale which also simplifies our life and is discussed more in the preprocessing section. I should note that while MR is good at soft-tissue contrast (e.g., the ability to discern between gray-matter and white-matter in the brain), CT has somewhat poor soft-tissue contrast. See the below head scans from an MR image and a CT image as an example, noting the contrast between the grey-matter (along the outside of the brain) and white-matter (the brighter tissue interior to the grey-matter) as well as the general noise level present in the brain for both images.

Example of a CT scan (left) and (T1-weighted) MR scan (right) [from the Qure.ai and Kirby 21 dataset, respectively]

Some of the reasons that MR scans are not always used are: 1) some people can’t due to a variety of reasons (e.g., no access, certain types of metal implants, etc.), 2) MR scans take a relatively long time compared to CT scans and 3) radiologists are interested in the particular measurements that CT can provide (e.g., looking at bone structure). Now that we have a basic understanding of the data and some of the intricacies of the imaging modalities, let’s discuss some datasets.

Datasets

Labeled data is somewhat sparse for medical images because radiologists are expensive, hospitals are concerned about lawsuits, and researchers are (often overly) protective of their data. As a result, there is not an ImageNet-equivalent in MR or CT. However, there are many commonly used datasets depending on the application domain. Since I mostly work with brain MR images, I’ll supply a small list of easily accessible datasets for MR and CT (brain) images along with the data format in parenthesis at the end of the bullet:

  • Brainweb: Simulated normal and MS brains with tissue/lesion segmentations (MINC)
  • Kirby 21: Set of 21 healthy patients scanned twice (NIfTI)
  • IXI dataset: Set of 600 healthy subject scans (NIfTI)
  • Qure.ai CT head scan data: Set of 491 head CT scans with pathology [no segmentation, but radiology report] (DICOM)

Here is a list of not so easy to download (but very useful) datasets.

I have not worked with the set of datasets below, but I know people who have and am including them for completeness.

Another place to look for datasets is in OpenNeuro which is a repository for researchers to host their brain imaging datasets; it mostly consists of fMRI from what I can tell. If your passion lies somewhere besides MR and CT brain images, then I’m unfortunately not a great resource. My first guess would be to look at the “grand challenges” listed here and see if it is possible to gain access to the data.

Not to bury the lede too much, but perhaps the easiest way to get access to some of the above data is through this website. I’m not sure that everything is sanctioned to be on there, which is why I have delayed to bring this up.

Preprocessing

The amount of data wrangling and preprocessing required to work with MR and CT can be considerable. I’ll outline the bare necessities below.

The first thing to consider is how to load the images into python. The simplest route is to use nibabel. Then you can simply use

import nibabel as nib
data = nib.load('mydata.nii.gz').get_data()

to get a numpy array containing the data inside the mydata.nii.gz file. Note that I’ll refer to the indices of this 3D volume as a voxel which is the 3D-equivalent of a pixel for a 2D image. For work with brain images at least, I’d recommend always converting the files to NIfTI (which corresponds to the .nii or .nii.gz extension). I find converting everything to NIfTI first makes my life easier since I can assume all input images are of type NIfTI. Here is a tool to convert DICOM to NIfTI, here is a script to convert MHA to NIfTI and here is a script to convert PAR/REC files to NIfTI. There are more file formats that you’ll probably need to work with, and you can use some of those scripts as inspiration to convert those file types.

We’ll first outline resampling, bias-field correction and registration which are staples of any medical image analysis. For these preprocessing steps, I’d recommend ANTs and specifically the ANTsPy variety (assuming you are coming from a python background). ANTs is actively maintained and has reliable tools to solve all of these (and many more) problems. Unfortunately, ANTsPy is not always easy to install, but I believe work is being done on it to solve some of the issues and once you are up and running with it you can access most of the tools ANTs offers natively from python. In particular, it supports the resampling, bias-field correction and registration preprocessing steps I’ll be discussing next.

As with natural images, MR and CT images do not have a standard resolution or standard image size. I’d argue that this fact is of greater importance in MR and CT though and must be considered for optimal ML performance. Consider the following: you train a 3D convolutional neural network with data acquired at 1x1x3 mm³ resolution and then you input an image into the network with 1x1x1 mm³. I would expect the result to be sub-optimal since the convolutional kernels will not be using the same spatial information. This is debatable and I haven’t examined the problem closely, but the non-standard resolution is something to keep in mind if you run into problems at test time. We can naively address the non-standard resolution problem by resampling the image to a desired, standard resolution (with cubic B-splines, of course, for the best quality).

For many applications, both MR and CT often require a process called registration in order to align objects across a set of images for direct comparison. Why would we want to do this? Let’s say you want to learn a function that takes an MR image and outputs an estimate of what the CT image would look like. If you have paired data (that is, an MR and CT image from the same patient), then a simple way of approaching this problem would be to learn the voxel-wise map between the image intensities. However, if the anatomy is not aligned in the image space, then we cannot learn this map in a supervised way. We solve this problem by registering the images and, in fact, we examine this problem in the experiment section.

The next two problems (described in the next two paragraphs) are specific to MR. First is that we have inhomogeneous image intensities due to the scanner in MR images. Since this inhomogeneity is not a biological feature, we generally want to remove it and we do so with a process referred to as bias-field correction (I’ll discuss one solution in the experiment section).

Another issue in MR are inconsistent tissue intensities across different MR scanners. While CT images have a standard intensity scale (see Hounsfield units), we are not so lucky with MR images. MR images absolutely do not have a standard scale, and the impact on algorithm performance can be quite large if not accounted for in preprocessing. See the images below for an example where we plot the histograms of a set of T1-weighted MR images without any intensity normalization applied (see the image with “Raw” in the title). This variation is due to effects caused by the scanner and not due to the biology, which is the thing we generally care about.

Histograms of image intensities from a cohort of T1-weighted brain images (left: white-matter mean normalized, right: no normalization applied)

There are a litany of intensity normalization techniques that attempt to remove this scanner variation (several of which I have collected in this repository called intensity-normalization). The techniques range from the very simple (e.g., simple standardization which I’ll refer to as z-score normalization) to the fairly technical (e.g., RAVEL). For neuroimaging, a good combination of speed and quality can be found in the Fuzzy C-Means (FCM) normalization technique which creates a rough tissue-class segmentation between the white-matter (WM), grey-matter and cerebrospinal fluid based on the T1-weighted image. The WM segmentation mask is then used to calculate the mean of the WM in the image which is set to some user-defined constant. This normalization technique seems to almost always produce the desired result in brain images. If you are not working on brain images, then you may want to look at either the Nyúl & Udupa method or simple z-score normalization. All of these normalization methods are available as command-line interfaces (or importable modules) in the intensity-normalization repository.

The last preprocessing step we’ll consider is specific to brain images. In brain images, we generally only care about the brain and not necessarily the tissues outside of brain (e.g., the skull, fat and skin surrounding the brain). Furthermore, this extraneous tissue can complicate the learning procedure and trip up classification, segmentation, or regression tasks. To get around this we can use skull-stripping algorithms to create a mask of the brain and zero out the background. The simplest way to go about this (in MR) — with reasonable results — is with ROBEX: a command-line tool that generally does a good job at extracting the brain from the image. I’ve seen it fail a few times on some data containing large pathologies or imaging artifacts, but other than that it is usually good enough for most machine learning tasks. For what it’s worth, I’d try to avoid skull-stripping your data since it is just another point of possible failure in your preprocessing routine, but sometimes it substantially helps.

Since MR and CT images aren’t standard like JPEG, your computer doesn’t have a native way to display it. If you want to visualize your data, take a look at MIPAV for non-DICOM images (e.g., NIfTI) and Horos for DICOM images. It is always good to look at your data, especially after preprocessing so we can verify that everything looks reasonable. For instance, perhaps the registration failed (it often does) or perhaps the skull-stripping failed (again, it often occurs). If you pipe your crappy data into your ML algorithm, you’re probably going to get crappy output and you’ll waste a lot of time doing unnecessary debugging. So be kind to yourself and examine the data.

Training a deep network for MR or CT applications

While deep neural networks applied to MR and CT are increasingly moving to 3D models, there has been good success with 2D models. If you have limited memory on your GPU or you have very limited training data, you may want to use a 2D network to squeeze the most performance out of the network. If you use a 3D network, you will quickly run into memory issues when passing a full image or patches through the network.

If you decide a 2D network is the way to go for your application (a reasonable choice), you’ll need to figure out/design a data loader to handle this. After fussing around with complicated data loaders that take the 3D image to a 2D image patch or slice for a while, I realized that that was all an unnecessary burden that made it harder to use pre-built data loader/data augmentation tools that aid in training. Thus my recommended solution to this problem is to simply convert the 3D volumes to 2D images. Since the original volumes are floating point numbers, I went with the TIFF image format which supports such types. Here is a command-line script which takes a directory of NIfTI images and creates a directory of corresponding 2D TIFF images (with some options to create slices based on axis and to only create slices from a portion of the image in order to avoid background slices).

In the following section, I’ll build a deep neural network with 3D convolutional layers. I’m doing this as opposed to using 2D convolutional layers because — once you convert the 3D volume to 2D images like TIFF — you can basically just use any 2D architecture you have lying around substituting the head for the appropriate application. If you are only interested in 2D models, then you can check out the niftidataset repository here for a hint on how to get up and running with fastai using TIFF images. Since the 3D problem is slightly more tricky to approach, I’ll dig into it below.

Experiment

In this section, I’ll outline the steps required to train a 3D convolutional neural network for a MR-to-MR synthesis task using pytorch and fastai. If you just want to look at the code, then there is also a notebook which contains most of the experiment (excluding preprocessing) here.

The setup is as follows: we’ll train a very small resnet to take an entire 3D volume from one MR contrast to another MR contrast; we’ll be learning the transform to map T1-weighted images to FLAIR images. This task is called MR image synthesis and we’ll refer to the network as a synthesis network. There are a variety of applications for this type of synthesis, but motivation for this problem is mostly that: MR scan time is limited, so not all contrasts can be collected. But we want to eat our cake and have it too, and we sometimes want those uncollected contrasts for image processing purposes. Thus we create some fake data using the data that actually was collected, where the fake data will be the result of our synthesis network.

In this experiment, I’ll be using 11 and 7 images as training and validation, respectively, from the Kirby 21 dataset. All images have been resampled to 1x1x1 mm³, bias-field corrected using N4, and the FLAIR images have been (affine) registered to the T1-weighted images using ANTsPy. Look here and here for the actual code I used to do the preprocessing (both are available as command-line interfaces when the intensity-normalization package is installed along with ANTsPy). Finally, all the images were individually z-score normalized using the entire image.

Now that we have motivated the problem somewhat and talked about the data we will use, let’s get to the code. The code block below defines some necessary constructs to work with fastai, specifically to use the data_block API.

Dataloaders classes and functions for NIfTI images in fastai

There is nothing particular to remark on here, except that once you figure out how to setup these types of structures, they are quite convenient (see the ItemList tutorial for more details). Note that not all functionality is supported with the current setup — I stripped it down to make it as simple as possible — but it’ll get the job done. I’ll show how this creates the training and validation dataloaders below. First, let’s define a preprocessing transform:

Relevant data transforms for our application

Why am I defining this odd cropping function? The reason is two-fold. The first reason is that the neck is not present in the FLAIR images but is present in the T1-weighted images. I don’t want the network to learn to take tissue to zero, so I’m removing that part of the data by only using the data in the 20–80 percent range along the axis corresponding to the axial plane. The second reason is that I can fit twice as many samples into a batch (that means a batch size of 2). The reason for the small batch size is that, like I previously mentioned, 3D networks with large images are memory-intensive. Why no other data augmentation? Unfortunately, 3D transforms are not natively supported with pytorch or fastai so I’d have to incorporate my own and I am not doing this for simplicity. Now let’s use the data_block API of fastai to create the training and validation dataloaders:

Define the dataloader with the data block API in fastai

You can see the notebook for more details, but essentially I have the T1-weighted images in one directory with train, valid, test subdirectories and a parallel directory with the FLAIR images. The get_y_fn function grabs the FLAIR image corresponding to the source T1-weighted image. Look here for more in-depth explanation of the remaining commands. Note that the (tfms,tfms) means that I am applying the previously defined crop to both the training and validation set. Applying that transform to the validation set isn’t ideal, but is required because of memory-constraints. Now let’s create some 3D convolutional and residual block layers which we’ll use to define our model:

Define some custom 3D layers

I’m closely following the definition of the 2D convolutional and residual block layers as defined in the fastai repository. As a side note, I left the spectral normalization and weight normalization routines in the conv3d definition, but disappointingly received worse results with those methods than when using batch norm (and I’m still not sure whether batch norm is applied before or after the activation). Now let’s define our model using the above layers:

Define our network

Here I have just defined the very small resnet model. Why so few layers? I am using as large of a network as my GPU can contain in memory. The creation of many channels with the entire 3D volume and the residual connections are burdens on the GPU memory. The only other thing of possible intrigue is that I use a 1x1x1 kernel at the end, which empirically produces crisper images (and I think is fairly standard). As a note, I realize that I should have removed the activation from the final layer; however, it is not a problem because I am z-score normalizing (i.e., mean subtracting and dividing by the standard deviation) the images with their backgrounds. The backgrounds, which are approximately zero, take up the vast majority of the volume of the image. Thus the z-score normalization essentially puts the background (corresponding to the mean) at zero, which makes the intensities of the head greater than zero. A fine result for the ReLU. Now let’s train this network:

Training

Again fairly normal. Mean square error is used because we want each voxel intensity in our source T1-weighted image to match the voxel intensity of the corresponding target FLAIR image. We use lr_find to help us pick a larger learning rate (as described here) for faster training, in addition to using the one-cycle policy. I always collect my training and validation data into a CSV file to look at how the network is converging, especially on machines where launching a jupyter notebook is a pain. I picked 100 epochs because I ran this a couple times and did not notice a great amount performance gain with more epochs.

After training completes, we input an entire image (not seen in either training or validation) into the network. An example is shown in below figure, where the synthesis result is the right-most image (with the title “Syn”).

Example synthesis result with the network defined above (from left to right: the input T1-weighted image, the true FLAIR image, and the synthesized FLAIR image)

While the above figure could have been had better window/level settings for better comparison, we see that the T1-weighted image does take on many of the characteristics of the true FLAIR image. Most notably, inside the brain, we see that the white-matter becomes less bright than the grey-matter while the cerebrospinal fluid remains dark. The noise characteristics are not the same though and there are some bright spots in the true FLAIR that are not captured in the synthesized image.

This result is not state-of-the-art by any means, but it’s interesting to see that we can learn an approximate transform with such an incredibly small dataset, no data augmentation, and a very small network. This network would assuredly be better with more data, data augmentation and a larger network, but this is just a simple, pedagogical toy example. I should note that unless you have a particularly large GPU (and contradicting my last statement), you may not be able to train this network with the full images. You’ll probably have to use either 3D patches or 2D slices (or 2D patches). For what it’s worth, I’ve collected some of these types of transforms in the niftidataset python package, specifically in the niftidataset.fastai module.

Conclusion

Hopefully this post provided you with a starting point for applying deep learning to MR and CT images with fastai. Like most machine learning tasks, there is a considerable amount of domain-specific knowledge, data-wrangling and preprocessing that is required to get started, but once you have this under your belt, it is fairly easy to get up-and-running with training a network with pytorch and fastai. Where to go from here? I’d download a dataset from one of the links I posted in the Datasets section and try to do something similar to what I showed above, or even just try to recreate what I did. If you can get to that stage, you’ll be in a comfortable place to apply deep learning to other problems in MR and CT.

I should note that there is work being done to create standard code bases from which you can apply deep learning to MR and CT images. The two that I am aware of are NiftyNet and medicaltorch. NiftyNet abstracts away most of the neural network design and data handling, so that the user only has to call some command-line interfaces by which to download a pre-trained network, fine-tune it, and do whatever. So if that is good enough for your needs, then go right ahead; it seems like a great tool and has some pre-trained networks available. medicaltorch provides some dataloaders and generic deep learning models with medical images in pytorch. I have not tested either extensively, so I cannot comment on their utility.

If you don’t like python, there is neuroconductor in R or NIfTI.jl and Flux.jl packages in Julia which can read NIfTI images and build neural networks, respectively. There are countless other relevant software packages, but those are the ones the first come to mind and that I’ve worked with.

As a final note, if you have luck creating a nice application for MR or CT make sure to share your work! Write a paper/blog post, put it up on a forum, share the network weights. It would be great to see more people apply deep learning techniques to this domain and push the boundary of the field where possible. Best of luck.