Original article was published by Maryam Vazirabad on Artificial Intelligence on Medium
It would be beneficial to filter out the unwanted planes in a dataset before using it to train a model. I would like to create a simple classification model that can determine the orientation of an image automatically. This will reduce the data preprocessing needed for future deep learning models.
I’ll start with a simple binary classification: determining whether an image plane is axial or not axial.
Task: Identifying axial images in CT exams (binary classification)
The dataset used to train the model is The Cancer Genome Atlas Ovarian Cancer (TCGA-OV), a data collection of CT images of the abdomen/pelvis in DICOM format. The data can be downloaded here.
More DICOM datasets can be found in The Cancer Imaging Archive (TCIA), an open-access database of medical images for cancer research.
2) Label data
Before I can add the images to train the deep learning model, they first need to be labeled.
An exam, also known as an imaging study, comprises a set of series. Each series includes a set of images, or Service-Object Pair Instances (SOP Instances), all of which are of the same imaging plane (coronal, sagittal, or axial). Therefore, the images will be labeled at the series level, and will have labels ‘Axial’ or ‘Not Axial’.
In order to label the images, I’ll be using the MD.ai Annotator. MD.ai is a platform that facilitates the creation and deployment of medical deep learning projects with annotation tools, cloud services, Jupyter integration, and client libraries.
The MD.ai Annotator allows the user to import medical images with cloud storage. The user can create labels with different annotation modes and export annotations, images, and labels for training.
More information about the MD.ai Annotator can be found here.
The TCGA-OV dataset is imported into the Annotator and the series of each exam are labeled either ‘Axial’ or ‘Not Axial’, as shown below.
Now that the images are labeled, they can be used to train and validate my first Fast.ai model.
3) Installation and setup
Notebooks are an easy and fast way to get started with Fast.ai. One important note is that the Fast.ai library and its notebooks need to be installed on a server with GPU. This is necessary when training a machine learning algorithm to reduce training time significantly.
I recommend using Google Colab notebooks, which require minimal installation and are free to use, including GPU use. To change the runtime to GPU in Colab, go to Runtime → Change Runtime Type → Set hardware accelerator to GPU.
At the top of every notebook, these three lines are written for automatic reloading and inline plotting. Inline plotting will make your plot outputs appear and be stored within the notebook.
I import all the necessary packages to process the data and use Fast.ai:
!pip install fastai
from fastai import *
from fastai.vision import *import pandas as pd
from pathlib import Path
from os.path import basename
import numpy as np
from glob import globimport warnings
warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.functional")
One suggestion I have prior to using Fast.ai is to filter out the warnings from the output. If not, the warnings will seemingly run on forever and slow down training considerably.
In order to download the annotations I created with MD.ai Annotator, I’ll import the
!pip install --upgrade --quiet mdai
4) Load and view data
I’ll be creating an
mdai client to download the annotations from the project within the Annotator.
The project id identifies the project I can access. The
mdai client also requires an access token, which authenticates you as a user. Note that my personal access token is hidden for privacy:
DOMAIN = 'public.md.ai'
project_id = "EoBKMNGm"
YOUR_TOKEN = ''
mdai_client = mdai.Client(domain=DOMAIN, access_token=YOUR_TOKEN)
A project object will attempt to extract the annotations to the specified path:
mdai_client.project(project_id, path='.', annotations_only=True)
This downloads the annotations in JSON format. I want to convert this file into a pandas dataframe. Fortunately, the
mdai library already has a helper function to achieve this:
JSON = Path.cwd().glob("**/*.json")
for j in JSON:
result = mdai.common_utils.json_to_dataframe(j)
Now that I have a dataframe of my annotations, I can filter out annotations that aren’t in the label group “Orientation”, which is the group that has the plane label names for each image:
a = result['annotations']
select_orientation = a.loc[a['groupName'] == 'Orientation']