Towards explainable AI for healthcare: Predicting and visualizing age in Chest Radiographs

Source: Deep Learning on Medium


I recently published a paper in SPIE 2019 that is related to a system that estimates the age of a person using Chest X-Rays (CXR) and deep learning. Such a system can be utilized in scenarios where the age information of the patient is missing. Forensics is an example of an area that could benefit.

More interestingly though, by using deep network activation maps we can visualize which anatomical areas of CXRs that age affects most; offering insight on what the network “sees” to estimate age.

It might be too early to tell how age estimation and visualization on CXRs can have clinical implications. Nevertheless, age discrepancy between the network’s prediction and the real patient age can be useful for preventative counseling of patient health status.

Excerpts from the paper as well as new experiments are provided in this post.


Introduction

Estimating either person’s age or an organ’s age using medical images has not been new and often can be useful for clinical and forensic purposes. Since 1937, there has been work using hand X-rays to estimate a person’s bone age to evaluate endocrine growth disorders in the pediatric population [1]. Other analogous examples include T-scores for bone density with DEXA scans (decrease with age) and calcium scores for coronary arteries in computer tomography (CT) scans (increase with age). Radiologists may also report that a patient’s brain CT has “chronic ischemic micro-vascular changes and atrophy out of proportion to the patient’s age”. In other words, various medical imaging modalities often contain visual features about a person’s internal anatomical structures or organs. These features are apparent to the human eyes and often have some correlation pattern with the expected biological age.

This observed correlation between imaging visual features and a person’s age makes the problem potentially solvable and interesting to computer vision researchers. We imagine that as computer vision research moves towards analyzing multiple imaging modalities (e.g. X-rays, CT, MR, etc.) of ever-increasing image quality, one potential useful output would be the computer’s estimation of the patient’s age at the person level and potentially for all the different organs individually.

Dataset

In the time of writing this paper the largest publicly available CXR dataset was the NIH ChestX-ray8. It contains more than 110,000 frontal CXR images from ~30,000 unique individuals. It also comes with metadata containing the following information for each image: 1) Image Index, 2) Finding Labels, 3) Follow-up Visit Number, 4) Patient ID, 5) Patient Age, 6) Patient Gender, 7) View Position, 8) Original Image Width and Height, 9) Original Image Pixel Spacing.

Figure 1. Age distribution for NIH ChestX-ray8 dataset (x axis: age in years, y: frequency of images)

Figure 1 shows the distribution of patient age (i.e. 1 years old to 90 years old) for the whole dataset after removing just 19 outliers that had values above 90 years old.

Experiments

We started by splitting the dataset into 80% training, 10% validation and 10% testing based on the patient ID to avoid overlapping between training/validation/testing.

Regression

We used a DenseNet 169 (non-pretrained) network to train a regression network with raw CXR images as input and the normalized age values (0,1] as output. The activation for this output node was set to sigmoid.

  • First Round of Experiments: We used a mean squared error as a loss for this round.
Training and validation loss for PA view using MSE loss. Minimum validation loss: 0.003
  • Second Round of Experiments: We also used the coefficient of determination, R2, as a loss because in dealing with a regression problem we wanted to identify the goodness-of-fit of the trained network.
Training and validation loss for PA view using R2 loss. Minimum validation loss: 0.90

Classification

As another approach we assigned age values to 9 age groups: (0, 10], (10, 20], (20, 30], (30,40], (40,50], (50,60], (60,70], (70,80], (80,90]. We used a DenseNet 201 (non-pretrained) as classifier with targets these age groups and we trained with raw CXR images as input. The activation of the network’s output was set to softmax. The classifier’s performance is demonstrated through the ROC curve below.

DenseNet 201 ROC for 9 age groups: (0, 10], (10, 20], (20, 30], (30,40], (40,50], (50,60], (60,70], (70,80], (80,90]

We also tried a pretrained DenseNet 201 but the performance was very similar.

Tell me what you see

Overall the performance of the classifier network was satisfying and this led us to investigate saliency maps and get a better understanding of what the network “sees” during classification. Towards this end we used Keras-Viz library which comes with standard deep network visualization methods. The figure below shows saliency maps from our trained DenseNet classifier above for various age groups.

Average saliency map per age group on the testing set. Left: (0,10], Middle: (10,20], Right: (20,30]
Average saliency map per age group on the testing set. Left: (30,40], Middle: (40,50], Right: (50,60]
Average saliency map per age group on the testing set. Left: (60,70], Right: (70,80]

Interestingly and, perhaps not clinically surprising, as the age of the patient progresses saliency areas shift from various and multiple anatomical areas to a more constrained area around the aortic arch and the mediastinum wall. Having consulted with our clinicians on these results we believe that this has to do with the enlargement and the calcification of the aortic arch and mediastinum as a patient gets older.

Conclusion

This work set the stage for the next research directions we would like to follow. More specifically, although the activation maps can help visualize areas predictive of age in CXRs, the real clinical value is to explore how we can utilize these maps in disease classifiers so that we can identify how much of an abnormal region is due to the natural age progression and how much is due a pathology/disease. We believe this extra information can help build better disease classifiers, such as classifying whether an image is normal or abnormal “for age”. The information may also help quantify the degree of physiologic and pathological processes present in patients. Examples of the latter may include but are not limited to osteoarthritis or degenerative disc disease of the musculoskeletal system, senescent or emphysematous changes of the lungs, and calcific atherosclerosis of the aorta and vasculature. Indeed, as algorithms improve it is possible that previously unknown but more reliable identifiers of aging or pathology will be identified. In the clinical setting, a physician may find in the predicted age of various organ systems or the overall predicted age, the opportunity to counsel the patient with respect to improving health habits.