Original article can be found here (source): Deep Learning on Medium
Modern Visual RecSys Part4b: COVID-19 Case Study with CNN
In this case study, we will explore the COVID-19 X-ray images with the same Convolutional Neural Networks RecSys flow we set up in the previous CNN chapter. We aim to identify clusters of X-ray images with similar severity in infection using Approximate Nearest Neighbors. We will swap out the training data and employ a more powerful pre-trained model (Resnet152); the rest of the code remains identical to the one we used for DeepFashion images. This work is meant as a proof-of-concept on of how we can apply the same framework we developed onto a completely different domain.
This work is not intended as medical research nor representative of how we can use CNN to detect COVID-19.
Explore the series
- How does a Recommender Work? [Foundational]
- How to Design a Recommender? [Foundational]
- Intro to Visual RecSys [Core]
- Convolutional Neural Networks Recommender [Pro]
- COVID-19 Case Study with CNN [Pro][we are here]
- Advanced topics — Visual Understanding [Pro](coming soon)
- Advanced topics — Temporal Modeling [Pro](coming soon)
- Conclusion and next steps [Foundational](coming soon)
- Foundational: general knowledge and theories, minimum coding experience needed.
- Core: more challenging materials with basic coding.
- Pro: Difficult materials and code, with production-grade tools.
The COVID-19 Data
Intuition of why CNN will be able to work well on this data set:
As outlined in the previous chapter, the strength of CNN is in the convolutional filters. These filters are very good at detecting shapes, lines, boundaries within the image. From the X-ray images, we see that as the infection worsens, the image blurs with more white areas and the rib cage becomes less visible; these are visual cues that CNN will be able to pick up and learn.
Cleaning the data
- As there are less than 25 samples of ARDS, Pneumocystis, SARS & Streptococcus in total, I decided to remove those samples and only keep COVID and healthy samples.
- As there are less than 25 samples of CT scans and only 1 CT scan for healthy patient, I decided to remove CT scans and only keep X-rays.
- After the cleaning we have 102 COVID X-rays and 1,584 healthy X-rays.
We will follow the exact same steps outlined in the previous Convolutional Neural Networks RecSys chapter (you can refer back to that chapter for more details):
- Convert images to embeddings
- Conduct Transfer Learning from ResNet152
- Use Fastai hooks to retrieve image embeddings from step 2
- Use Approximate Nearest Neighbors to obtain most similar images based on the embeddings from step 3.
For healthy X-ray scans, our model is able to pick up 36 most similar X-rays that are all healthy.
For infected X-ray scans, our model usually picks up a mix of 80% infected X-ray scans and 20% healthy scans.
For the seriously infected X-ray scans, our model is able to pick up 36 most similar X-rays that are all infected.
Potential use case of this work
We can use this model to track the change in scan severity over time. If the scan today has fewer healthy neighboring scans and is drifting towards the seriously-infected cluster, this is a sign that the condition of the patient has worsen over time.
Link to Colab (you just need a free Google Account to run the code on GPU in the cloud)
What have we learned
In this chapter, we explore the use of our previously developed CNN Recsys flow in healthcare domain. We observed how we can train a powerful model with minimum changes to our code, showcasing the flexibility of our flow.