Deep Learning in Medical Imaging VI

Deep Learning for Precision Medicine. Courtesy of GE Medical

Training a FCN-8s with Caffe on a custom CT dataset

In this post, I will demonstrate how to train a fully convolutional network (FCN) to segment subcutaneous adipose tissue found in computed tomographic (CT) abdominal images using the Caffe deep learning framework.

In Chapter IV, I presented a tutorial on how to train a fully convolutional neural net (i.e. FCN-8s) for the segmentation of anatomical regions of interest (ROI) from CT images using NVIDIA’s Deep Learning GPU Training System (DIGITS). DIGITS uses Caffe under the hood for its FCN segmentation models and is an outstanding prototyping tool as it enables users to optimize the training of complex AI models with a minimal learning curve. I say minimal in comparison to the relatively steep learning curve associated with popular deep learning frameworks such as Caffe or Tensorflow.

AI researchers obviously have to master the tools of their trade. However, the goal of AI research is to develop optimized learning models in a finite time. AI prototyping tools are invaluable as they foster ease of experimentation with various deep neural networks (DNN) including their associated model parameters and optimization algorithms. In my opinion, we are going to need a whole lot more of them in the years to come…

However, DIGITS was not ideal for running inference testing to predict ROI segmentation on new CT images. That is, if I wanted to feed my trained FCN-8s a new CT image to segment, I had to use the DIGITS web interface. DIGITS displays the results on its web interface but I could find no option to download the predictions to a local drive for further analysis. I would have to train the model the old fashioned way. Enter Caffe.

From the Caffe website: Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Caffe provides many tools and demos for the design and implementation of deep neural nets and can be run on either CPU or GPU hardware. Simply put, if you want to be part of the deep learning revolution, mastering Caffe (and other deep/machine learning frameworks and libraries) is a must. However, most deep learning tutorials redundantly show how to use Caffe on publicly available, curated image datasets. What if you want to train your model on your own imaging data? What if you wish to add your own customized Python Caffe layer to your neural network? What if you wish to modify your FCN to segment one class of objects instead of 1000? What follows is a quick and dirty tutorial that will answer the aforementioned questions. Clinical significance, data augmentation techniques, and FCN-8s’s transfer learning component were already discussed in Chapter IV:

Creating an LMDB from a Custom Dataset

A Lightning Memory-Mapped Database (LMDB) is a high-performance transactional databases with a key-value store architecture and is one of several data model options that Caffe utilizes. LMDB’s software library creates memory-mapped files from custom image and label data providing favorable I/O performance for large datasets. Fortunately Caffe comes standard with a handy executable called convert_imageset that to converts training data into an LMDB. Alternatively, Python has a handy LMDB library if you wish to write your own converter.

Caffe’s default convert_imageset has optional flags such as shuffle that are frequently used for object classification and detection. However, they are not necessary for this pixel-wise FCN segmentation task. The training and validation data were resized during data preprocessing obviating the need for the resize_height and resize_width flags. This particular FCN-8s requires grayscale labels necessitating the convert_imageset’s gray flag.

The following script template calls convert_imageset 4 times to create 4 mdb files from the training/validation data which will be saved in folders named training_images_lmdb, training_labels_lmdb, validation_images_lmdb, and validation_labels_lmdb:

Figure 1. Caffe LMDB creation script

The two necessary text files which respectively list all the training or validation image filenames along with their integer class labels have the following structure:

img00041.png 1
img00042.png 3
img00048.png 7
img00039.png 2
img00046.png 1

The first column is the training or validation filename. The second column is the integer class label corresponding to the object in its associated file, i.e., 1=car, 3=cat, 7=aeroplane etc. Note that for this tutorial we only had two class labels being 0=background and 1=subcutaneous adipose tissue. Therefore for the binary segmentation task, each filename had a 1 as its integer class label. As there are no other class labels, this was actually redundant. However, Caffe requires an integer value in its class label field or else it will crash.

► Training Day

With our data converted, let’s examine Caffe’s command-line used to train our FCN-8s

Figure 2. Caffe command-line options used to train our FCN-8s
  1. $CAFFE_ROOT is defined in Figure 1
  2. The -gpu flag tells Caffe which gpu to (gpu ‘0’, gpu ‘1’, gpu ‘2’, etc.). For example, on my workstation, gpu ‘0’ maps to a GeForce GTX 1080 while gpu ‘1’ maps to a GeForce GTX 1060.
  3. The -weights flag is the path to the weights of the transfer learning model trained on a much larger dataset which increases training and validation performance ( again see Chapter IV for details). Our patient CT dataset contained 6162 images which is too small for adequate learning even with data augmentation. When deep neural networks are starved for data, their learned parameters (a.k.a. weights) are underdetermined and thus often inaccurate.
  4. The -solver flag is the path to the solver.prototxt file. Let’s take a look at it to learn more about Caffe:
Figure 3. Caffe solver.prototxt describing the optimization details used to train our FCN-8s

We immediately see that the solver.prototxt decouples neural net achitecture detail (Line 12) from neural net optimization details (Lines 3, 6–8 and 13). The solver type (Line 13) is the optimization method used to minimize prediction error during backpropagation for each training epoch. This minimization of error (or loss) is the actual learning behind deep learning.

Line 12 points to a file that contains the specifics of the DNN model architecture used for training (data, convolution, pooling layers etc.). Fully convolutional networks such as FCN-8s used in this exercise are quite complex as they contain a convolutional neural net and deconvolutional layers to output pixel-wise prediction. Our training.prototxt file (Lines 53–95 shown in Figure 4) specifiying the details of our FCN-8s network is 695 lines long! This is the actual deep behind deep learning.

Figure 4. Portion of Caffe’s training.prototxt which details FCN-8s’s network architecture

Adding a Custom Python Layer

Deep learning libraries provide a comprehensive variety of built-in classes for users to reproduce just about any DNN feature. In Figure 4, you’ll note pre-built Caffe layers for convolution and ReLU along parameter options for learning rate and decay parameters. However, there are times when you wish to create your own class layer to extend a DNN or as part of a whole new DNN architecture. Caffe accepts custom Python layers implemented as Python classes derived from caffe.Layer base class.

In this binary segmentation exercise where pixels can either be your feature of interest or background, traditional DNN accuracy metrics can give a false sense of performance. This is known as the classic class-imbalance problem. In this exercise, about 85% of all input pixels that are black background with the remaining 15% being the subcutaneous visceral fat. Our fully convolutional neural net can claim to be 85% accurate if it predicts all pixels as background!

As such, I’ve added a custom Python layer at the end of training.prototxt introducing the Dice scoring metric discussed at length in Chapter V:

Figure 5. Custom Python Dice Layer beginning on Line 685

The Python Dice scoring class is defined as follows:

Figure 6. Python class definition of Dice Scoring metric

Note the import caffe module on line 3. Training (by hitting Enter with the command-line options described in Figure 2) takes 5–6 hours to complete on my modest GPU workstation with the final lines of output looking as follows:

Figure 7. We have a trained model with a 94.6% Dice Score and a 2.7% loss on our validation dataset!

Note at the bottom of Figure 7 we see that the final lines for Iteration 27740 compute the accuracy, Dice Score, and loss for our validation data in that last optimization step. Note also that a few lines above that we see that the final optimized deep neural net model is named snapshot_iter_27740.caffemodel. We will be using it shortly…

Do you remember the reason I had to use Caffe in the first place? I wished to apply my trained FCN-8s model to new data for prediction. In deep learning, this process is known as inference as the model is infering a result. Now that my trained model has shown excellent learning performance on the training and validation data, let’s feed it some new CT images to see how accurately it segments subcutaneous adipose tissue (SAT, a.k.a., that sexy spare tire around our midsection).


Evan Shelhamer, one of the architects of FCN-8s posted a very useful Python program ( on Github that runs inference on FCN models designed with Caffe. I have modified it for this tutorial:

Figure 8. Running inference with our trained neural network using

Even for those new to Caffe, there are some obvious aspects of the code.

  1. We again import caffe on Line 3 so we can use Python’s Caffe libraries
  2. Lines 8 and 9 tell Caffe to use the GPU instead of the CPU and specifies which GPU to run (remember my GTX 1080 is GPU ‘0’)
  3. Line 11 is the path to snapshot_iter_27740.caffemodel which is the trained and optimized FCN-8s model from Figure 7
  4. Line 12 is a brand new prototxt file which I’ll get to shortly. deploy.prototxt is identical to training.prototxt except there information about the training and validation data has been removed. This makes sense as we are not using feeding the model the data used to train it but instead new data for inference. Instead of data layers with detailed information on the training data, deploy.prototxt has data layer that contains information on the expected resolution of new input images.
  5. Lines 13 is the path to a input folder full of new CT files ready for segmentation by the Caffe model listed in Line 11.
  6. Line 14 is the path to the output folder to store the results of the inference segmentation results produced by the Caffe model listed in Line 11
  7. Line 17 is a loop which will read in input files for segmentation
  8. Line 28 sends the deploy.prototxt and trained model as parameters for inference testing
  9. Line 36 has a lot going on under the hood but the reader may recall seeing score2_classes label in the custom Python layer as well as in the loss and accuracy layers from the training.prototxt which is displayed in Figure 5. Recall this DNN was trained with only two classes, black background and SAT. Pixels from input images are classified as either one or the other. This line of the computes FCN-8s’s segmentation performance using the Dice Score metric defined in the custom Python layer.


Figure 9. SAT = subcutaneous adipose tissue. VAT = visceral adipose tissue. Soft tissue is primarily composed of muscle and organs. Tissue with blue hues include bone and contrasted vasculatur. Note white arrows in Inference column showing minor differences from Ground Truth

Patient CT images are grayscale by default. This makes tissue discrimination difficult for the layperson. I have added false color to the image grid in Figure 9 to help the reader discern relevant anatomical features of interest.

In the interest of fidelity, I have not rescaled the output of FCN-8s as scaling introduces artifacts. FCN-8s outputs a 256×256 image consisting of the red SAT pixel prediction mask shown overlaying the patient CT data in the Inference column of Figure 9. Both the input and output image resolution for FCN-8s is 256×256 pixels. In addition:

  1. Visceral Adipose Tissue (VAT ) is colored green in all images
  2. Soft tissue, which is primarily muscle and internal organs, retains its original gray colors from the original CT scan
  3. Tissue denser than fat and soft tissue (e.g. bone) are shown in blue hues denoting relative density (the whiter the anatomy, the denser)
  4. Blood vessels also appear dense (blue-white) as all 3 test examples were contrast-enhanced abdominal CTs. In such scans, patients swallow an oral contrast agent which is a short-lived radionuclide that temporarily increases the density of their blood. This method is used to enhance the visibility of the patient’s vascular system in CT scans when deemed medically necessary.
  5. In the Inference column, 2 of the 3 images have slight segmentation errors noted by white arrows. The prediction in row 1 missed a few pixels compared to Ground Truth in the adjacent column. The prediction in row 3 confused a few pixels of VAT for SAT.


SAT is indistinguishable from VAT except for spatial location. CT measure tissue density and all body fat is in the same density range. There is no difference between a pixel of SAT and a pixel of VAT except their relative location in the image. Fat is fat…

Our fully convolutional neural net was fed CT abdominal images along with that image’s unique SAT image label, i.e., images with pixel values shown in red in the Ground Truth column of Figure 9. FCN-8s learned the location of SAT in patient scans during training. FCN-8s was able to predict SAT in new patient scans with almost flawless results as shown in Figure 9.

Non-learning algorithms often fail to discriminate SAT from VAT when they are anatomically adjacent as they are in the first and third example. At the top of those images, visceral fat is close to subcutaneous fat. It is difficult to write rules based algorithm to account for all possible scenarios describing SAT and VAT adjacency as human anatomy anatomy is extraordinarily varied from patient to patient (see Figures 2 and 3 in Chapter IV). Decision trees and conditionals frequently fall prey to unforeseen anatomical counter-examples. However, FCN-8s successfully learned the spatial characteristics of abdominal fat from its training data and transfer learning.

There is room for improvement. The patients shown in Figure 9 and used to train this model were relatively healthy. Patients with little SAT because of illness, poor nutrition, or age-related wasting will not segment accurately as FCN-8s has a resolution limit of 8×8 pixels. While adequate for this exercise, this resolution may not be adequate for other medical imaging challenges such as determining the segmentation boundary of tumor margins. Medicine will require FCN segmentation limits better than 64 square pixels for such cases. Alternatively, instead of requiring downsampled CT images for training, FCNs that are able to process full resolution CT images (512×512 pixels vs. 256×256 or 227×227 pixels)will mitigate errors stemming from their resolution limit. Performance will also obviously improve if the neural net is trained with more images that reflect these low-SAT patient populations.

Source: Deep Learning on Medium