Detecting Pneumonia in Chest X-Rays with Convolutional Neural Networks

Original article was published on Deep Learning on Medium

Chest X-Rays

Our dataset consists of a series of chest x-rays that fall into 2 classes:

Normal Chest X-Ray
Pneumonia Chest X-Ray

To the untrained eye it is difficult to determine the differences between the 2 images. A lay-person would struggle to classify whether a chest x-ray shows a patient with pneumonia so let us see how our data can be used to train a useful model.

The dataset is split into 3 parts:

  • Train — Data to train our model
  • Val — Validation data used to adjust our model during training
  • Test — Data to determine the overall performance of our model

The validation set is too small to be useful inputs to adjust our model during training. Idealy, we have an 80/20 Train/Val split so we will combine the validation dataset to the training dataset and split them into an 80/20 split.

Convolutional Neural Network

When using a normal neural network to process images, we flatten the pixel values of the matrix into a 1-d array which is then fed into the NN as a normal tensor.

However Convolutional Neural Networks are better suited for image recognition because of its use of a ‘kernel’. Instead of having a single pixel be 1 node in the layer, we apply a small matrix of weights, or a ‘kernel’, over the 2D input data.

The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an elementwise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. — Source

Using a kernel to build a convolution (image right) gives us several benefits:

  • Layers with smaller number of nodes or parameters
  • Fewer parameters result in less connections between layers making the network more efficient.
  • Parameter sharing that allows us to reuse what we learn from one part of the image on a different part of another image

Let’s take a look at the CNN we will use on this dataset:

Our network has 8 layers that successively applies a kernel on the 2D input and a rectified linear unit (ReLU) to extract feature from the image before the final linear networks that acts as a classifier by taking the extracted features as inputs and outputs probability of a class: Normal, Pneumonia.

After training our network, it reports a 96% accuracy on the validation dataset.

We see the accuracy sharply increases from accuracy of 0.27 from random guessing and gradually flattens over successive trainings.

To evaluate the performance of the model over all, we use our newly trained model to make predictions on the test dataset.

The actual accuracy is only 76%, which while good, isn’t good enough to make any medical decisions.

Let’s examine where our errors lie.

Taking a closer look at the predictions we see that this model is biased in favor of predicting Pneumonia over Normal. Our model correctly classified a chest x-ray as normal 85 times but incorrectly classified a chest x-ray as pneumonia 149 times, that nearly 2:1 in favor of predicting pneumonia where it should not.

Misdiagnosing a patient with Pneumonia can result in unnecessary medical intervention. Clearly, we need to adjust the model to lower the rate at which it classifies a chest x-ray showing Pneumonia. While this is a problem to address in our model, maintaining accuracy on a correct Pneumonia is also important as missing a diagnosis can mean much worse outcomes for a patient.

Another Look at the Data

After examining training dataset more carefully, we can see the problem lies in the training set being imbalanced.

The training dataset is heavily skewed towards pneumonia chest x-rays. A model that is fed more pneumonia training data will inherently predict more pneumonia chest x-rays.

Using PyTorch’s WeightedRandomSampler, our dataloader will give us balanced batches of Pneumonia to Normal chest x-rays when training the model.

Convolutional Neural Network v2

Using the same CNN with a WeightedRandomSampler to adjust for the class imbalance, we train the model once more and evaluate.

We have gone from 76% to 78% accuracy. A small improvement, but let’s see if we managed to detect more incidents of Pneumonia.

We can see the improvement we were looking for after we balanced the training data. The model has gotten better at detecting normal chest x-rays, going from correctly identifying normal chest x-rays 85 times to 104 times, without any loss in the number of correct pneumonia classifications.

There is still room for improvement so we will add more some more complexity to our model.

Convolutional Neural Network v3

For our 3rd attempt, we will try various methods to improve out model by using:

  • Data augmentation
  • Additional layers
  • Regularization techniques

We augment our chest x-rays so that it is skewed, or horizontally flipped, or rotated, or any combination there of. This allows the model not to over rely on characteristics on a particular section of the image in the hopes of improving accuracy.

Our model has added 3 additional layers for a total of 11 layers

We also deploy regularization techniques such as drop out and batch normalization. The dropout layers randomly drop a node’s output data from progressing to the next layer and the normalization normalizes the outputs from a layer so that certain node having weights much higher than other nodes do not drown out the other nodes.

Let’s examine how these added complexities affect the performance of our model.

The accuracy has improved again from a 78% accuracy to 82% accuracy, but we need to examine how it has affected our rate of correct normal and correct pneumonia detection.

Success! Our normal detection has improved yet again. We’ve improved our correct normal chest x-ray from 104 to 127. We are finally more correct than incorrect (127:107) on predicting a normal chest x-ray.

The number of correct pneumonia predictions had a small decrease in performance, dropping from 385 to 384. Unfortuantely we missed 6 incidents of Pneumonia incorrectly classifying those chest x-rays as normal.

For good measure, here’s the ROC:

It appears our model can only do so much, so let’s see if models built by others can help us.

Transfer Learning

ResNet34 is a pretrained residual neural network model with 34 layers. This model has been pretrained on the ImageNet dataset to classify millions of different every day images in thousands of classes with a high degree of accuracy.

Using transfer learning we utilize the existing knowledge in ResNet34 as a base to help build a more accurate model for our dataset. ResNet34 makes a good base because of its training on every day images but it’s important to note that it is not optimal to utilize ResNet34 to identify medical images or other out of the ordinary images.

An example CNN Source: NVIDIA

We are only interested in using the feature extraction portion of ResNet34 as the base of our model because if we use all of ResNet34, it would include the classification layers that determine whether the extracted features constitute a dog, a cat, or myriad different classes it has been trained on while we are only interested in our 6 classes of images. Therefore, we will replace the blue potion in the figure above with our own classifier that takes as input the extracted feature and outputs probabilities of the 2 classes of our dataset.

From the PyToch models package, we get ResNet34 using the pretrained=True option so it comes with knowledge from its training on ImageNet. We then replace the last layer with our own classification layer that reduces the outputs to 6 classes.

The freeze method sets the require_grad option to False for the feature extraction layers but leaves the classification layer’s require_grad to True. We do this because we want to turn off learning on the feature extraction layers as not to overwrite its previous training, as such the training cycle below will only train the last classification layer.

Finally, we unfreeze the network to train the entire network on our dataset with a loser learning rate that customizes the pre-trained network to our dataset that give us a small bump in the overall accuracy.

The following figure shows two bumps representing the learning rates over the two training cycles: one cycle with training off on the extraction layer, and one cycle training the entire network.

We examine its accuracy over training and evaluate the model overall.

Looks like Resnet34 was a good base with which to build our model. The accuracy is at 87% and importantly, we’ve improved both our correct normal chest x-ray prediction from 127 to 156 and correct pneumonia prediction from 384 to 389; and reduced our false negative from 5to 1.

This model using transfer learning has outperformed our CNN11.

And the ROC plot:


We have explored pneumonia chest x-ray classification on a convolutional neural network and improved upon its accuracy through various techniques.

We fixed a class imbalance in the training data, added additional layers, and deployed data augmentation and regularization techniques to improve the model’s performance.

Finally, we used transfer learning to utilize a pre-built model to test its effectiveness on classifying chest x-rays.

To see the full source code, check out my notebook. Explore the previous versions to see the starting CNN and its subsequent iterations.