Pneumonia Detection using Convolutional Neural Network

Original article was published on Artificial Intelligence on Medium

Pneumonia Detection using Convolutional Neural Network


With the completion of my AIML course, I wanted to put my new-found knowledge to use and work on some side-projects which will both help me get some practical exposure in the field and either solve an interesting real-life problem or have some sort of practical usefulness to it. On this quest to find a project that fits my need I had finally decided that I will build a convolutional neural network that detects Pneumonia by looking at chest X-ray images.

What is Pneumonia?

Pneumonia is a form of an acute respiratory infection that affects the lungs. The lungs are made up of small sacs called alveoli, which fill with air when a healthy person breathes. When an individual has pneumonia, the alveoli are filled with pus and fluid, which makes breathing painful and limits oxygen intake.

Pneumonia is the single largest infectious cause of death in children worldwide. Pneumonia killed 808 694 children under the age of 5 in 2017, accounting for 15% of all deaths of children under five years old. Pneumonia affects children and families everywhere but is most prevalent in South Asia and sub-Saharan Africa.

Project Overview

With these facts serving as an inspiration I had started working on my project:

To start off with I needed to figure out where I can build and run the code as my systems specs were not sufficient to handle the amount of computational power it takes to build a model so I used Google’s Colab. Colab allows you to write and execute Python in your browser, with

  • Zero configuration required
  • Free access to GPUs
  • Easy sharing

Whether you’re a student, a data scientist, or an AI researcher, Colab can make your work easier.

Note: Need API

In the below code I have also mentioned how to access the dataset from Kaggle but the important things to note are:

  1. Have your API key file i.e. ‘kaggle.json’ ready
  2. With this method, every time the runtime resets in google collab you will lose all your files and the generated model so once done always download your model file and after every reset remember to re-import the dataset

What Am I trying to Achieve?

When you submit an X-Ray Image of a 5-year-old kid’s chest, the algorithm should be able to predict with high accuracy, if the patient has Pneumonia.

The Code

  • First, you have to upload your API key file on to your Jupyter notebook in Colab:
  • Next, import the dataset from Kaggle and unzip it:
  • I have used the Chest X-Ray Images (Pneumonia) dataset by Paul Mooney as the data was already conveniently split into the train, test, and Val:
  • Train -contains the training data/images for teaching our model.
  • Val — contains images that we will use to validate our model. The purpose of this data set is to prevent our model from overfitting. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the model’s ability to generalize.
  • Test — this contains the data that we use to test the model once it has learned the relationships between the images and their label (Pneumonia/Not-Pneumonia)
  • Now, let us start by importing all the required libraries:
  1. The Keras Python library makes creating deep learning models fast and easy. The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs.
  2. Keras Conv2D is a 2D Convolution Layer, this layer creates a convolution kernel that is wind with layers input which helps produce a tensor of outputs. (Note:- Kernel: In image processing kernel is a convolution matrix or masks which can be used for blurring, sharpening, embossing, edge detection and more by doing a convolution between a kernel and an image.)
  3. MaxPooling2D from keras.layers, which is used for pooling operation. For building this particular neural network, we are using a Maxpooling function, there exist different types of pooling operations like Min Pooling, Mean Pooling, etc. Here in MaxPooling we need the maximum value pixel from the respective region of interest.
  4. Flatten from keras.layers, which is used for Flattening. Flattening is the process of converting all the resultant 2-dimensional arrays into a single long continuous linear vector.
  5. Dense from keras.layers, which is used to perform the full connection of the neural network
  6. ImageDataGenerator, which Takes a batch of images and applies a series of random transformations to each image in the batch (including random rotation, resizing, shearing, etc.) and then Replacing the original batch with the new, randomly transformed batch for training the CNN.
  • I have first defined two variables that hold the values of image dimensions that I am going to use and the batch size I am going to use:
  • I have then created an object of the sequential class and have started with coding the convolutional step:
  • I then took the classifier object and added a convolution layer by using the “Conv2D” function. The Conv2D function is taking 4 arguments:
  1. the first is filter this is a mandatory Conv2D parameter that defines the numbers of filters that convolutional layers will learn from i.e 32 here, filters are taken to slice through the image and map them one by one and learn different portions of an input image. Imagine a small filter sliding left to right across the image from top to bottom and that moving filter is looking for, say, a dark edge. Each time a match is found, it is mapped out onto an output image.
  2. the second argument is the shape each filter is going to be i.e 3×3 here,
  3. the third is the input shape and the type of image(RGB or Black and White)of each image i.e the input image our CNN is going to be taking is of a 64×64 resolution and “3” stands for RGB, which is a color image
  4. the fourth argument is the activation function we want to use, here ‘relu’ stands for a Rectified Linear Unit function. The activation function is a node that helps to decide if the neuron would fire or not. Relu sets all negative values in the matrix x to zero and all other values are kept constant.
  • Now I perform pooling operation on the resultant feature maps I got after the convolution operation is done on an image. The primary aim of a pooling operation is to reduce the size of the images as much as possible.
  • Next, I convert all the pooled images into a continuous vector through Flattening. Flattening is a very important step to understand. What we are basically doing here is taking the 2-D array, i.e pooled image pixels and converting them to a one-dimensional single vector.
  • Now, to create a fully connected layer I have connected the set of nodes I got after the flattening step, these nodes will act as an input layer to these fully-connected layers. Dense is the function to add a fully connected layer, ‘units’ is where we define the number of nodes that should be present in this hidden layer.
  • Then I have initialized the output layer which consists of a single node giving me the respective output
  • Once building the CNN model is finished it is time for compiling it:
  • Here we have used the following parameters:
  1. Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based on training data.
  2. Cross-entropy loss, or log loss, measure the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverge from the actual label.
  3. Finally, the metrics parameter is to choose the performance metric.
  • Before we get into fitting our CNN to image dataset we need to pre-process the images to prevent overfitting:
  1. For this task we have used the ImageDataGenerator of Keras and have passed the following parameters:
  2. rescale: rescaling factor. Defaults to None. If None or 0, no rescaling is applied, otherwise we multiply the data by the value provided (after applying all other transformations).
  3. shear_range: Float. Shear Intensity (Shear angle in the counter-clockwise direction in degrees). Shearing used to transform the orientation of the image.
  4. zoom_range: Float or [lower, upper]. The range for random zoom.
  5. horizontal_flip: Boolean. Randomly flip inputs horizontally.
  6. flow_from_directory: Takes the path to a directory, and generates batches of augmented/normalized data.
  • Now to fit the data to the CNN model:
  1. In the above code, ‘steps_per_epoch’ holds the number of training images, i.e the number of images the training_set folder contains.
  2. And ‘epochs’, A single epoch is a single step in training a neural network; in other words, when a neural network is trained on every training samples only in one pass we say that one epoch is finished. So the training process should consist of more than one epoch.
  • Once you have finally built and trained your model you can pass images to classifier.predict()(i.e. [modelname].predict()) function and get the predictions.
  • You can also save your model for future use by using the [modelname].save())

The results

If you are interested in the code, you can check out GitHub

I was able to achieve:

Accuracy: 89.74358974358975%

Precision: 88.80952380952381%

Recall: 95.64102564102565%

Higher accuracy can be achieved by changing the number of layers used in the network. Another way to improve the accuracy would be to change the hyperparameters accordingly.


It is fairly easy for any developer with decent programming skills to create a Machine Learning models which could be useful to millions of people.

Much better results have been achieved by professionals out there. As a beginner, I was able to achieve an accuracy of 89%, which is clearly not bad. But in order to be used in the real world, by millions of people, 89% accuracy means it will misdiagnose roughly 1,00,000 cases.

With future development, improvement in ML technology, but most importantly — involvement of more and more people working on it — we will be able to solve more of such problems.