Diagnosing Pneumonia with Transfer Learning

Source: Deep Learning on Medium

Diagnosing Pneumonia with Transfer Learning

Chest x-ray of a child with pneumonia.

DISCLAIMER! In no way, shape or form am I a medical professional.

For the past 15 weeks, I have gone through the Flatiron Immersive Data Science Program in Atlanta, Georgia. The journey was extensive, but I learned more than I could have even imagined. Over a year ago, when I started my education in computer science at Auburn I would look at the future paths that I could take with a computer science major. I always saw the artificial intelligence but never thought to ever take it because I never saw myself as that intelligent of a person to be able to come up with a system that can mimic some of our brain’s capabilities until I stepped foot into the data science program.

This article is not about promoting my experience with Flatiron. This article is about my final project and what I built. I have always been interested in AI because of its unimaginable capabilities. For example, being able to diagnose medical conditions better than a doctor. As I went through the program I knew we could do anything for our final project, so in my free time, I would look for a dataset that had the potential to diagnose a medical condition. I found one, a huge thanks to Kaggle, with chest x-rays of children with pneumonia.

There were a lot of unknowns that I knew were going to be obstacles in this project. The computing power that a convolutional neural network requires, shortage of data issues, and questioning if I could even build a CNN with the ability to do what I needed. During the process, I overcame those obstacles and ended with extraordinary results that I could not fathom. The dataset I used from Kaggle was a competition and I unofficially beat the winner of the competition using a CPU instead of a GPU. I say “unofficially” because I have not had anyone from Kaggle or an expert check the results. When I saw the results I didn’t even believe them. I thought it was a mistake or the network was overfitting. The network wasn’t overfitting.

The question going through your mind is “How did he do it?”. I began by examining the notebooks that competed in the previous competition and investigated their results. Most of the competitors reshaped the images to a convention of 150 by 150. This did not make sense to me. If the goal is the predict whether or not the patient has pneumonia then you would not want to condense the image smaller because you would lose details in the image. These are medical images and you want to be confident that the CNN is seeing what it needs to. If the image is unsuitably small you might lose important information.

At first, I had the mindset that I would not be able to obtain the accurate results that the competitors did because they had access to GPUs and I did not. I did not want to use the techniques that they used. I tried to make a CNN, but if I got it to train, the results would be overfitting and only able to predict the x-rays that didn’t have pneumonia. If I did not get it to train my computer would crash which were not the results I wanted. At this time I reshaped by images to 560 by 400 which is huge for an input shape for a CNN. I realized I should make the images smaller and then started using transfer learning. I had never tried transfer learning before, but I had read articles on it and watched tutorials for the code for transfer learning. I call the network you don’t train the parent network. All of the competitors were using the latest and greatest networks, such as the Xception and VGG16, to classify images. These networks create so many parameters, so I knew my little laptop was not fit to use those networks. I was forced to use the MobileNet model which utilizes depthwise separable convolutions to reduce the amount of computing power required. Using these convolution techniques gives the freedom to use this model on mobile applications like phones and maybe even IoT devices. Also, the competitors were not optimizing their fully connected networks. More specifically, they were not using batch normalization or dropout layers to correct overfitting. I used these layers in my fully connected layers and this is where I think I took a leap to get ahead the competitors and was able to gain results that surpassed the winner.

I named the network PneumoNet, I know silly but I couldn’t come up with anything more clever. The PneumoNet can diagnose pneumonia with an accuracy of 96.2% and recall of 94.5%.

A final disclaimer:

I’m not claiming I am the winner because I have not had someone look at my code and network or do a peer review. This competition was over two years ago. I am not a doctor or medical professional.

If you have any questions please, contact me. And always if you would like to, connect with me on LinkedIn. Also here is my Github repo for this project. Take a dive into please figure out if I did something wrong. I want as many eyes on it as possible.