Source: Deep Learning on Medium
This article is part of the “Deep Learning in Practice” series.
What is MURA?
(source) MURA (MUsculoskeletal RAdiographs) is a large dataset of bone X-rays that allows to create models that determines whether an X-ray study is normal or abnormal (we could use as well this dataset to classify bones into the categories shoulder, humerus, elbow, forearm, wrist, hand, and finger). MURA is one of the largest public radiographic image datasets.
Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. The Stanford ML Group hopes that their dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.
This dataset is available to the community and the Stanford ML Group is holding a competition to determine if the models created can work as well as the radiologists on the task (note: read the MURA Submission Tutorial to know the process of submitting your results for official evaluation).
The objective of the MURA competition is to classify every study into normal or abnormal (binary predictions), not every image. The best Radiologist Performance Stanford University is 0.778
Fastai v1 on the MURA dataset
(source: paper, May 2018) The MURA dataset contains 40,561 images from 14,863 studies. Each study contains one or more views (images) and is manually labeled by radiologists as either normal or abnormal.
Theses images are divided into 36808 training images (within studies) and 3197 validation ones (within studies).
We used 2 pretrained models: a simple one (resnet34) and a much deeper one (densenet169, the one used by the paper writers) in order to demonstrate what can bring a deeper pretrained network in the health world of classifying radiographies images.
For each model, we used the standard fastai v1 way for classification:
- use of a pretrained model,
- creation of an ImageDataBunch by the use of the function from_folder(),
- databunch image size divided by 2 (112) and after multiplied by 2 (224),
- training of the last added layers and then, training of the whole model after unfreezing,
- use of the function lr_find() to get the best learning rate,
- use of the function fit_one_cycle() that allows to optimize the training by adapting the value of the learning rate for each model weight,
- analysis of the results (predictions on validation set) with the functions ClassificationInterpretation.from_learner(), interp.top_losses(), interp.plot_confusion_matrix(),interp.most_confused() and interp.plot_top_losses()
Our 2 models (resnet34 and densenet169) ont dépassé l’accuracy globale par étude du paper model. Ramené à chacune des catégories, notre meilleur modèle (densenet169 avec image size = 224) échoue à battre le paper modèle dans seulement 2 catégories sur 7: hand et wrist (cf tableau de comparaison suivant et diagramme ci-cessous).
The overall accuracy of our model (densenet169) is 0.829 and it would allow us to get the 4th place of the MURA competition (see screenshot below of the MURA Competition leaderboard).