Data Augmentation in Medical Images

Original article was published by Cody Glickman on Deep Learning on Medium

Data Augmentation in Medical Images

How to improve vision model performance by reshaping and resampling data

Photo by CDC on Unsplash

The popularization of machine learning has changed our world in wonderful ways. Some notable applications of machine learning allow us to do the previously unthinkable, like determining if an image is a hot dog or not a hot dog.

Screenshot from HBO’s Not Hotdog app developed by “SeeFood Technologies” in the show, Silicon Valley.

The ease to develop image recognition and classification applications has been streamlined in the last few years with the release of open source neural network frameworks like TensorFlow and PyTorch. Usage of these neural network frameworks is predicated on the availability of labeled training data, which has become more accessible within cloud infrastructures. Neural networks require large amounts of data to properly weight the functions between layers. However, in fields like medical imaging, large amounts of labeled training data are not always available. For those interested in medical imaging data, a great resource can be found at Giorgos Sfikas’ GitHub.

How can you effectively train a neural network to classify medical images with limited training data. One answer is to augment the labeled data you already have and feed the transformed images into your model. Augmentation serves two purposes. First, additional labeled training data from augmentation in theory will improve your image classification model accuracy [WARNING!!! can lead to overfitting]. Second, the transformations will allow the model to train on orientation variations. Possibly providing the model flexibility when encountering subtle variation shifts in testing or real world data.

Does it actually work?

Below is the accuracy of a model trained both with and without data augmentation. I will go into more details about these results later in the article.

Performance of x-ray multiclass classification using different forms of augmentation. The number of classes above random involve an area under the curve (AUC) score greater than 0.5. Image by author.

A decent improvement on a small training set. I used only 2GB of 40GB of total data to train the model. Data augmentation reminds me of semi-supervised learning in that you are creating new labeled data to train a model. Data augmentation is also similar to oversampling techniques. For those interested in learning more about semi-supervised methods check out the article below by Andre Ye.

Data augmentation is most commonly applied to images. There exists two themes of data augmentation. The first is image transformation and the second is synthetic image creation. For the purpose of this article, I will focus primarily on image transformations with an application in medical imaging using python. Parts of the code used in this demo are adapted from the AI for Medical Diagnosis course by The code repository can be found on GitHub and the data used for the modeling can be obtained from the NIH Clinical Center Chest X-Ray database.

How do we perform data augmentation?

Image manipulation in python can be performed with multiple libraries. PIL and Augmentor are two examples of libraries that can operate directly on images. Augmentor also includes a pipelining function to operate over several images at once. For the purposes of this article, I utilize ImageDataGenerator apart of keras_preprocessing. Types of image augmentations include rotation, cropping, zooming, color range changes, grayscaling, and flipping. Augmentor, also includes a random noise subsection creator for object detection models. When performing any type of data augmentation, it is important to keep in mind the output of your model and if augmentation would affect the resulting classification. For example, in X-ray data the heart is typically on the right of the image, however the image below shows a horizontal flip augmentation inadvertently creates a medical condition call situs inversus.

Data augmentation may create issues with class labels if position is important. The yellow circle highlights the heart. This is the same image horizontally flipped. Image by author.

Data augmentation used in this article

For the purposes of this article, I used three levels of data augmentation. First, I ran a model without any augmented images. Next, I used a basic color normalizing augmentation. Finally, I created a model using complex augmentations like zooming, rotating, and cropping images as shown in the example below.

Example of data augmentation used in complex augmentation training. Image by author.

Implementing augmentation methods (code)

Data and libraries

The full code can be found on the article GitHub. The data for this tutorial can be found from the NIH Clinical Center Chest X-Ray database. In this example, I only utilize the data from images_001.tar.gz, which unzips to about 5K images (~2GB). Also, I downloaded the image labels as Data_Entry_2017_v2020.csv. The libraries used to perform data augmentation require keras and keras-preprocessing. I installed these packages using conda.

### Augmentation
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
### Visuals
import matplotlib.pyplot as plt
import pandas as pd
### Modeling
from tensorflow.keras.applications.densenet import DenseNet121
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from keras.models import load_model
from keras import backend as K

When creating the models, I ran into the following error:

AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'

Solution: Add tensorflow before keras import call as seen below

from tensorflow.keras.applications.densenet import DenseNet121
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

Preparing the images and metadata

To assign labels to the x-ray images, I needed to binarize a the condition column in the metadata. There are 15 unique conditions in this study:

['Cardiomegaly', 'Emphysema', 'Effusion', 'Hernia', 'Infiltration', 'Mass', 'Nodule', 'Atelectasis','Pneumothorax','Pleural_Thickening', 'Pneumonia', 'Fibrosis', 'Edema', 'Consolidation', 'No Finding']

Patients can have more than one condition in an x-ray. I used scikit-learn to munge the data into the appropriate format with binary values for the 14 conditions excluding the ‘No Finding’ category.

from sklearn.preprocessing import MultiLabelBinarizer
### Binarise labels
mlb = MultiLabelBinarizer()
expandedLabelData = mlb.fit_transform(df["labels"])
labelClasses = mlb.classes_
### Create a DataFrame from our output
expandedLabels = pd.DataFrame(expandedLabelData, columns=labelClasses)
expandedLabels['Images'] = df['Image Index']
expandedLabels['ID'] = df['Patient ID']

I added the paths to the corresponding x-ray image as a new column in the multicolumn binarized dataframe.

Next, to test the modeling performance, I split the data into training (80%) and testing (20%) groups. The figure below shows the frequency of the classes in the training dataset.

Frequency of condition in training data. Image by author.

Build generators to augment the images

ImageDataGenerator is capable of processing images into a generator object to avoid loading all the image transformations into memory. ImageDataGenerator is also able to create a generator directly from a pandas dataframe. I built the generator with the code below:

def get_train_generator(df, image_dir, x_col, y_cols, shuffle=True, batch_size=8, seed=1, target_w = 320, target_h = 320):    ### Perform data augmentation here
image_generator = ImageDataGenerator(rotation_range = 5, shear_range = 0.02,zoom_range = 0.02, samplewise_center=True, samplewise_std_normalization= True)

### Create the image generator
generator = image_generator.flow_from_dataframe(

return generator

To change the amount of augmentation, change the value assigned to the image_generator by adjusting the variables called within ImageDataGenerator. To call this generator, use the following line:

IMAGE_DIR = "images/"
train_generator = get_train_generator(training, IMAGE_DIR, "Images", labels)

I also built a generator for the testing data.

Building the deep learning model

I used a DenseNet121 architecture with weights from imagenet to pre-train the model.

### Pre-trained model
base_model = DenseNet121(weights='imagenet', include_top=False)
x = base_model.output### Add spatial average pooling and logistic layer
x = GlobalAveragePooling2D()(x)
predictions = Dense(len(labels), activation="sigmoid")(x)
model = Model(inputs=base_model.input, outputs=predictions)
model.compile(optimizer='adam', loss='categorical_crossentropy')
### Build model and predict, validation_data=valid_generator,steps_per_epoch=100, validation_steps=25, epochs = 10)
predicted_vals = model.predict(valid_generator, steps = len(valid_generator))

Visualizing the output

The model predictions were visualized using AUC curves. The AUC values for each iteration was saved into the table below:

AUC values by augmentation status. Image by author.

I created an AUC curve for each condition and augmentation status.

import numpy as np
from sklearn.metrics import roc_auc_score, roc_curve
def get_roc_curve(labels, predicted_vals, generator):
auc_roc_vals = []
for i in range(len(labels)):
gt = generator.labels[:, i]
pred = predicted_vals[:, i]
auc_roc = roc_auc_score(gt, pred)
fpr_rf, tpr_rf, _ = roc_curve(gt, pred)
plt.figure(1, figsize=(10, 10))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_rf, tpr_rf,
label=labels[i] + " (" + str(round(auc_roc, 3)) + ")")
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
f"Error in generating ROC curve for {labels[i]}. "
f"Dataset lacks enough examples."
return auc_roc_vals
auc_rocs = get_roc_curve(labels, predicted_vals, valid_generator)

No augmentation

Performance of model with no augmentation. Image by author.

Basic augmentation

Performance of model with basic augmentations. Image by author.

Complex augmentation

Performance of model with complex augmentations. Image by author.

The table summarizing the performance of the models using augmentation is shown below again:

Performance of x-ray multiclass classification using different forms of augmentation. The number of classes above random involve an area under the curve (AUC) score greater than 0.5. Image by author.

Discussion and conclusions

In this article, I introduced the concept of data augmentation as well as demonstrated its relative performance improvement in a small multiclass recognition task. Data augmentation is a useful tool to expand the amount of available labeled data for deep learning models. I described some types of data augmentation and introduced potential pitfalls to augmenting without considering the classification orientation. In this dataset, the complex augmentation performs poorly at defining hernias in chest x-rays. Hernias are typically found in the tissue near the bottom of the abdomen. With the complex augmentation, I may be altering the model’s ability to distinguish a hernia from the surrounding tissue due to the color adjustment or the rotation.

Image showing the relative location of a hiatal hernia. Image from wikipedia.

The modeling is only utilizing a small subset of the total available data. The advantages of data augmentation may be more pronounce with more than 4000 training images (24000 in the complex augmentation). The code for this article can be found on GitHub. Again, for those interested in medical imaging datasets, a great resource can be found at Giorgos Sfikas’ GitHub. My name is Cody Glickman and I can be found on LinkedIn. Be sure to check out some of my other articles below:

A rotation augmentation. Image by author.