Automatic Impression Generation from Medical Imaging Report

Original article was published on Artificial Intelligence on Medium


Table of Contents

1 Business Problem
2 Introduction about the Dataset
3 Prerequisite
4 Existing Research-Papers/Solutions
5 My Approach — Solution
6 XML Parsing Creating Data Points
7 Data Preprocessing
8 Exploratory Data Analysis
9 Data point construction
10 Train Test and Validation split
11 Tokenization and Dataset preparation
12 Basic Model [CNN]
13 Main Model [CNN-BiLSTM]
14 Conclusion
15 Error Analysis
16 Future work
17 References

1. Business Problem

The problem statement here is to find the impression from the given chest X-Ray images. These images are in two types Frontal and Lateral view of the chest. With these two types of images as input we need to find the impression for given X-Ray.

To achieve this problem, we will be building a predictive model which involves both image and text processing to build a deep learning model. Automatically describing the content of the given image is one of the recent artificial intelligence models that connects both computer vision and natural language processing.

2. Introduction about the Dataset

Open-i chest X-ray collection from Indiana University

This dataset is about 7,470 chest x-rays with 3,955 radiology reports for the chest x-ray images from Indiana university hospital network. — Images are downloaded as png format — Reports are downloaded as xml format.

Each xml is the report for corresponding patient. To identify images associated with the reports we need to check the xml tag <parentImages id=”image-id”> id attribute in the id we have the image name corresponding to the png images. More than one images could be associated with one report or xml.

Original data source: https://openi.nlm.nih.gov/

Other Resources: https://www.kaggle.com/raddar/chest-xrays-indiana-university

Sample Data point:

sample openi nlm datapoint

3. Prerequisite

Before we go through deep on this work, I assume that you are familiar with the following deep learning concepts and python libraries.

Convolution Neural Network, Recurrent Neural Network, LSTM, Transfer learning, Activation functions, Optimization techniques like SGD, Adam. Loss functions like categorical cross entropy, sparse categorical cross entropy. Finally, TensorBoard for performance visualization and debugging

Python, tensorflow, Keras, tokenizer, Pandas, numpy, Matplotlib. Understanding concept of Sequential Api, Functional Api and model subclass type keras model implementation. The reason I have chosen the subclasse model is, it is fully-customizable and enables you to implement your own custom forward-pass of the model. Also we can have control over every nuance of the network and training process.

source

Below I have mentioned import blogs and tutorials to begin with.

1. https://www.tensorflow.org/tutorials/text/nmt_with_attention — TensorFlow Tutorial

2. https://www.tensorflow.org/tutorials/text/image_captioning — TensorFlow Tutorial

3. https://becominghuman.ai/transfer-learning-retraining-inception-v3-for-custom-image-classification-2820f653c557 — Transfer Learning tutorial

4. https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202 — InceptionV3 model tutorial

5. https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/ — why ImageNet — why InceptionV3

6. https://www.pyimagesearch.com/2019/10/28/3-ways-to-create-a-keras-model-with-tensorflow-2-0-sequential-functional-and-model-subclassing/ — 3 ways Keras model implementation

7. https://www.tensorflow.org/tensorboard/get_started — TensorBoard Tutorial

4. Existing Research-Papers/Solutions

This work is inspired from the below research and Blog:

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

In the mentioned paper they have used encoder and decoder model with attention mechanism. In the encoder part they have used CNN to extract feature from images. In decoder they use a long short-term memory (LSTM) network that produces a caption by generating one word at every time step conditioned on a context vector, the previous hidden state and the previously generated words. They have used BLEU score to measure the performance of the model.

Few other Blogs i have referenced.

1. https://towardsdatascience.com/image-captioning-in-deep-learning-9cd23fb4d8d2

2. https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/

5. My Approach — Solution

Initially I will be doing the Exploratory Data Analysis part I both image input and text output with EDA I could find the data imbalance, Images availability per patient, Type of images associated for each patient. After the EDA I will be implementing deep learning model with two different approach to find the improvement on one another.

1. The basic model:

A simple encoder and decoder architecture. In encoder part it will have the CNN single fully connected layer to get the feature vector of images from pretrained InceptionV3 model. Decoder part will be having LSTM layer where it takes two inputs one is image feature vector and the sequence of text to word in each time step.

2. Main Model:

I will be using encoder-decoder architecture to generate the impression from the chest X-ray. The encoder will output the image feature vectors. The feature vectors are then passed to decoder with attention mechanism this will generate the next word for the content of the image. With same model approach from basic model I will be creating a new architecture which is implemented using the research paper Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

As initial step I will do an image classification using InceptionV3 model over this dataset https://www.kaggle.com/yash612/covidnet-mini-and-gan-enerated-chest-xray. With this classification model I will save the weights over this training and use this weight in Encoder feature extraction by loading the saved weights to InceptionV3.

Encoder:

The encoder is a single fully connected linear model. The input image is given to InceptionV3 to extract the features. this extracted feature of two images are added and input to the FC layer to get the output vector. This last hidden state of the Encoder is connected to the Decoder.

Decoder:

The Decoder is a have a Bidirectional LSTM layer which does language modelling up to the word level. The first-time step receives the encoded output from the encoder and the <start> vector. This input passed to 2 stage Bidirectional LSTM layer with attention mechanism. The output vector is two vector one is predicted label and other is the previous hidden state of decoder this fed back again to decoder on each time step. Detailed Architecture is mentioned below.

High level model Architecture

6. XML Parsing Creating Data Points

In this section we will see how the raw xml data is parsed and structured as data points, Then the data points are stored in csv files for future model requirements.

Raw XML Tree View:

<eCitation>
<meta type="rr" />
<uId id="CXR1" />
<pmcId id="1" />
<docSource>CXR</docSource>
<IUXRId id="1" />
<licenseType>open-access</licenseType>
<licenseURL>http://creativecommons.org/licenses/by-nc-nd/4.0/</licenseURL>
<ccLicense>byncnd</ccLicense>
<articleURL />
<articleDate>2013-08-01</articleDate>
<articleType>XR</articleType>
<publisher>Indiana University</publisher>
<title>Indiana University Chest X-ray Collection</title>
<note>The data are drawn from multiple hospital systems.</note>
<specialty>pulmonary diseases</specialty>
<subset>CXR</subset>
<MedlineCitation Owner="Indiana University" Status="supplied by publisher">
<Article PubModel="Electronic">
<Journal>
<JournalIssue>
<PubDate>
<Year>2013</Year>
<Month>08</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
</Journal>
<ArticleTitle>Indiana University Chest X-ray Collection</ArticleTitle>
<Abstract>
<AbstractText Label="COMPARISON">None.</AbstractText>
<AbstractText Label="INDICATION">Positive TB test</AbstractText>
<AbstractText Label="FINDINGS">The cardiac silhouette and mediastinum size are within normal limits. There is no pulmonary edema. There is no focal consolidation. There are no XXXX of a pleural effusion. There is no evidence of pneumothorax.</AbstractText>
<AbstractText Label="IMPRESSION">Normal chest x-XXXX.</AbstractText>
</Abstract>
<Affiliation>Indiana University</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Kohli</LastName>
<ForeName>Marc</ForeName>
<Initials>MD</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Rosenman</LastName>
<ForeName>Marc</ForeName>
<Initials>M</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Radiology Report</PublicationType>
</PublicationTypeList>
<ArticleDate>
<Year>2013</Year>
<Month>08</Month>
<Day>01</Day>
</ArticleDate>
</Article>
<EssieArticleTitle>Indiana University Chest X-ray Collection</EssieArticleTitle>
<IMedAuthor>Marc David Kohli MD</IMedAuthor>
<IMedAuthor>Marc Rosenman M</IMedAuthor>
</MedlineCitation>
<MeSH>
<major>normal</major>
</MeSH>
<parentImage id="CXR1_1_IM-0001-3001">
<figureId>F1</figureId>
<caption>Xray Chest PA and Lateral</caption>
<panel type="single">
<url>/hadoop/storage/radiology/extract/CXR1_1_IM-0001-3001.jpg</url>
<imgModality>7</imgModality>
<region type="panel">
<globalImageFeatures>
<CEDD>f2p0k1352</CEDD>
<ColorLayout>f1p0k36</ColorLayout>
<EdgeHistogram>f0p0k969</EdgeHistogram>
<FCTH>f4p0k2423</FCTH>
<SemanticContext60>f3p0k305</SemanticContext60>
</globalImageFeatures>
</region>
</panel>
</parentImage>
<parentImage id="CXR1_1_IM-0001-4001">
<figureId>F2</figureId>
<caption>Xray Chest PA and Lateral</caption>
<panel type="single">
<url>/hadoop/storage/radiology/extract/CXR1_1_IM-0001-4001.jpg</url>
<imgModality>7</imgModality>
<region type="panel">
<globalImageFeatures>
<CEDD>f2p0k1013</CEDD>
<ColorLayout>f1p0k36</ColorLayout>
<EdgeHistogram>f0p0k184</EdgeHistogram>
<FCTH>f4p0k1133</FCTH>
<SemanticContext60>f3p0k277</SemanticContext60>
</globalImageFeatures>
</region>
</panel>
</parentImage>
</eCitation>

From the xml file we will be extracting the Abstract and ParentImage Nodes. In this we have the Impression and image file name as below.

Impression level:

We will retrieve the Abstract text values

<Abstract>
<AbstractText Label="COMPARISON">None.</AbstractText>
<AbstractText Label="INDICATION">Positive TB test</AbstractText>
<AbstractText Label="FINDINGS">The cardiac silhouette and mediastinum size are within normal limits. There is no pulmonary edema. There is no focal consolidation. There are no XXXX of a pleural effusion. There is no evidence of pneumothorax.</AbstractText>
<AbstractText Label="IMPRESSION">Normal chest x-XXXX </AbstractText>
</Abstract>

Image File name:

Image file name available in the id attribute. We can ignore other details because the data are not relevant for our report. As we can see there are two parentImage nodes we have two image for this report.

<parentImage id="CXR1_1_IM-0001-3001">
<figureId>F1</figureId>
<caption>Xray Chest PA and Lateral</caption>
<panel type="single">
<url>/hadoop/storage/radiology/extract/CXR1_1_IM-0001-3001.jpg</url>
<imgModality>7</imgModality>
<region type="panel">
<globalImageFeatures>
<CEDD>f2p0k1352</CEDD>
<ColorLayout>f1p0k36</ColorLayout>
<EdgeHistogram>f0p0k969</EdgeHistogram>
<FCTH>f4p0k2423</FCTH>
<SemanticContext60>f3p0k305</SemanticContext60>
</globalImageFeatures>
</region>
</panel>
</parentImage>
<parentImage id="CXR1_1_IM-0001-4001">
<figureId>F2</figureId>
<caption>Xray Chest PA and Lateral</caption>
<panel type="single">
<url>/hadoop/storage/radiology/extract/CXR1_1_IM-0001-4001.jpg</url>
<imgModality>7</imgModality>
<region type="panel">
<globalImageFeatures>
<CEDD>f2p0k1013</CEDD>
<ColorLayout>f1p0k36</ColorLayout>
<EdgeHistogram>f0p0k184</EdgeHistogram>
<FCTH>f4p0k1133</FCTH>
<SemanticContext60>f3p0k277</SemanticContext60>
</globalImageFeatures>
</region>
</panel>
</parentImage>

XML parser code to retrieve the details mentioned above.