Convolutional Neural Networks for Medical Image classification

Source: Deep Learning on Medium


In October 2018, Illuin Technology had the chance to collaborate with the french Gustave Roussy Institute by participating in an AI-oriented radiology challenge for the 2018 edition of JFR (Journées Francophones de Radiologie).

More specifically, the goal was to train AI models to predict, based on radiological images (MRI, Scanner, etc.), various pathologies such as meniscus fissures, breast cancer tumours, kidney dysfunctions, thyroid cartilage or liver lesions.

This article details the challenges we faced and the technical approaches we developed. Main topics covered in this article are :

  • Goals & General setup of the prediction tasks
  • Data overview
  • Images preprocessing
  • Neural Networks training
  • Results

Goals & Setup

We focused on analysing MRI images of meniscus. These meniscus potentially contain a fissure, that we have to characterize. More precisely, the goal was to work on three binary classification tasks:

  • Detecting the presence of a meniscal fissure on the image : present or absent.
  • Understanding its orientation : is the fissure horizontal or vertical ?
  • Understanding its position : is the fissure located in the anterior or posterior part of the knee structure ?
Overview of an horizontal, anterior meniscus fissure

For instance, on the knee MRI above, there is a horizontal-oriented meniscal fissure, located on the anterior knee area.

To tackle this project, we’ve been leading various explorations to build the best model. We mainly compared two approaches :

  • Option 1 : Segmenting the two meniscus zones (small triangles) where the meniscal tear are to be found, and analysing the extracted zones
  • Option 2 : Processing the image entirely, without any segmentation

The second option was selected, as its was a more straightforward approach. Also, we assumed that keeping the whole bone structure would help in determining the position (anterior or posterior) of the fissure. We first applied preprocessing operations on the images, before training convolutional neural networks for each label.

Data Overview

The data consists of 1128 meniscus images. Out of these 1128 knee images, 25 % contain a fissure. In these 282 fissure images, 9% are located on the anterior part of the knee, and 91% on the posterior part, while horizontal/vertical repartition of these fissures is more balanced (60%/40%).

One image sample, with a horizontal fissure

A few images contain a double meniscal fissure (on the anterior and posterior part of the knee structure). A few other images do not have the right size (256, 256). These images are put apart for the training task to be consistent.

Training, validation and test sets were created with 70, 20, and 10% of the images respectively.

Images Preprocessing

In order to highlight efficient features for medical diagnosis, various computer vision operations are applied.

Filtering

First, filtering processes allow edge detection and contrast enhancement. Hereafter are shown various filtering operations, that aims at highlighting the anatomical structures of the given knee : Sobel, Laplace and Schaar filters.

Various filtering processes on meniscus images

Sobel filter achieved great results on edge detection. It allowed an efficient highlight of the main bones and the two knee meniscus. Benchmarking these 3 filtering processes, Sobel filter have had a positive impact on our classification performances. The filtered image is added to the original image as a 2nd channel, so that we still keep the original information.

Normalization

Moreover, as images can have very different pixel value distributions, sample-wise image standardization is run to balance various brightness and contrast levels.

Sample-wise image standardization

Data Augmentation

Also, as the training dataset is quite small, a robust data augmentation strategy is required :

  • Random image rotation, between -45° and +45°
  • Random horizontal translation, between -20% and +20% of the image width
  • Random vertical translation, between -20% and +20% of the image height

We kept the augmentation amplitudes quite low, to avoid too much confusion for the networks, especially on the position and orientation predictions. Indeed, a vertical oriented fissure becomes horizontal with a 90° rotation, and anterior and posterior locations can be confused with a 180° rotation. Of course, the surrounding bone structures can help assessing the correct answer, but we preferred keeping it simple.

Various augmented versions are shown on the figure below.

Let’s now understand the way we trained these 3 classification tasks.

Neural Networks training

Multi-task paradigm

First, the training paradigm for achieving the 3 predictions is not straightforward. First option would consist in training a multi-label classification algorithm. But the 3 labels are interdependent: indeed, we don’t want to predict the orientation and location of a meniscus fissure if the predicted label for its presence is False.

Hence, we chose to train 3 separate neural networks : the fissure detection network, the orientation classification network, and the position classification network.

The fissure detection network was trained with the whole dataset, while the orientation and position networks were fed only with training images that contained a meniscal fissure. At inference time, these networks are used only if the fissure detection network classifies a Yes for the fissure presence.

A more complex set up would be to train the 3 models with a common image-representation block. The feed-forward for each model would enrich a common representation, instead of training separately the three representations. This solution requires some time and effort on the implementation, in order to separately back-propagate the gradients on a shared image representation for the three models. This setup has not been implemented so far, and is an important part of our next research works.

Network architectures

As a reminder, we feed the networks with the original image, stacked with its Sobel-filtered version. The input image is hence of shape (2, 256, 256).

The 3 networks are pretty similar. They are based on simple convolutional and pooling layers for extracting meaningful features from the images. At the end, three fully-connected layers are used, including one final unit for binary classification. Dropout is used both on dense layers and convolutional blocks to keep a good enough generalisation capability. The precise setup for each network is shown on the figure below.

3 Convolutional Blocks (kw : 7, 5, 3), 3 Dense Layers (32, 16, 1)
4 Convolutional Blocks (kw : 7, 5, 3, 3), 3 Dense Layers (32, 16, 1) — Trained only on images with fissure=1
4 Convolutional Blocks (kw : 7, 5, 3, 3), 3 Dense Layers (32, 16, 1) — Trained only on images with fissure=1

We experimented some transfer learning from the ResNet50 and VGG16/19 representations, but it didn’t bring any better results. It appears that medical images are too specific, and hence too far from what pre-trained image representation models already know. We also experimented CNN based auto-encoders representations, that were rather inconclusive.

Training

The 3 models were trained with the common binary cross-entropy loss function. We benchmarked a large grid of hyper-parameters. The final set of parameters we chose was :

  • Batch size : 32
  • Learning rate : 0.001
  • Optimizer : Adam
  • 80 training steps per epoch, 20 validation steps.
  • Dense dropout rate : 0.4
  • Convolutional dropout rate : 0.2

Moreover, learning rate is dynamically adapted to the training behaviour. It is regularly halved based on the AUC metric evolution (Area Under Curve, that stands for the AUROC metric). Overfitting still occurs after 2500 to 3000 iterations (~35 epochs). We hence applied an early stopping policy to stop the model right before the overfitting starts.

Training (in grey) and validation (in orange) curves are available on the plots below, including accuracy, AUC and loss evolutions.

Training metrics for meniscal tear detection

Results

Inference process

At inference time, images are also augmented, but with a smaller amplitude for the rotation operation, to minimize the impact on the original image. An ensembling (voting) strategy is also applied : the models are run on the original image + 5 randomly augmented versions of the original image. The predicted values are averaged for the final prediction.

Final performances

The JFR DataChallenge team assessed a performance score based on AUC values for each prediction task. More precisely, a weighted average score is computed based on the 3 AUC scores, with 40% for the fissure detection task and 30% for the two other tasks.

On our own test sets, we achieved task-wise AUC scores of :

  • 92,22 % for the fissure detection network
  • 92,30 % for the orientation network
  • 82,14 % for the position network

These scores correspond to a global score of 89,72 %.

An evaluation set was released at the very end of the challenge to compute final team scores. On this final set, we achieved a 84,6 % performance, reaching the 3rd best score (out of 11 teams).

Conclusion & Next steps

More research works are underway. Main next steps consists in :

  • Training a common image representation with shared training processes
  • Deep diving on image segmentation neural networks to extract meniscus areas before classification
  • Exploring the meniscus areas segmentation also with classical computer vision algorithms (edge detection, dilation, erosion, thresholding, etc…)
  • Enriching the filtering pipeline
  • Working on deeper augmentation generators with blurring, noising or contrast operations

As a wrap up, we achieved good classification performances on these radiological diagnosis. Strong filtering strategies and Deep Learning models are both involved in these promising results.

Future research works for AI-based healthcare diagnosis tools are planned for the incoming AI Challenge for Health (#AIParisRegion, https://www.aichallenge.parisregion.eu/health-challenge).