Deep learning with X-ray Eyes

Source: Deep Learning on Medium

CXR Data Preprocessing — Augmentation

let’s now conduct the preprocessing steps with the CXR dataset.

  1. X-ray images have very high resolutions; the image sizes are 1024 pixels x 1024 pixels..
  2. X-ray images are single channel images and not RGB ( which is three-channel images).
  3. Center Crop should not be applied on Xray images to ensure abnormalities within the images corners aren’t missed.

This brings us to the next aspect of data pre-processing — data augmentation. Many times, the quantity of data that we have is not sufficient to perform the task of classification well enough. In such cases, we perform data augmentation.

As we know that pooling increases the invariance. If a sign of abnormalities is in the top left corner of an image, with pooling, we would be able to recognize if the abnormalities is in little bit left/right/up/down around the top left corner. But with training data consisting of data augmentation like flipping, rotation, cropping, translation, illumination, scaling, adding noise etc., the model learns all these variations. This significantly boosts the accuracy of the model. So, even if the abnormalities are there at any corner of the image, the model will be able to recognize it with high accuracy.

Augmentation is applied by following module. DataGenerator is used for Augmentation.

datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
rotation_range=10,
width_shift_range=0,
height_shift_range=0,
vertical_flip=False,)

Vertical flip needs to be set as ‘False’. This is because CXR images have a natural orientation — up to down.

CXR Data Preprocessing — Normalisation

Since the CXR images are not “natural images”, we do not use the “divide by 255” strategy. Instead, we take the max-min approach to normalisation. Since you do not know for sure that the range of each pixel is 0–255, you normalise using the min-max values.

def preprocess_img(img, mode):
img = (img — img.min())/(img.max() — img.min())
img = rescale(img, 0.25, multichannel=True, mode=’constant’)

if mode == ‘train’:
if np.random.randn() > 0:
img = datagen.random_transform(img)
return img

In the conditional statement ‘if mode == train’, use a random number generator so that only a fraction of the images get transformed (rather than all of them).

CXR: Network Building

We will use resnet for classification. Since ResNets have become quite prevalent in the industry, it is worth spending some time to understand the important elements of their architecture.Let’s start with the original architecture proposed here. The basic problem ResNet had solved was that training very deep networks was computationally hard — e.g. a 56-layer net had a lower training accuracy than a 20-layer net. By the way, before ResNets anything having more than 20 layers was called very deep.

Now let’s see how ResNets had solved this problem. Consider the figure above (from the paper). Let’s say the input to some ‘unit’ of a network is x (the unit has two weight layers). Let’s say that, ideally, this unit should have learned some function H(x), i.e. given the input x this unit should have learned to produce the desired output H(x).

In a normal neural net, these two layers (i.e. this unit) would try to learn the function H(x). But ResNets tried a different trick. They argued: let F(x) denote the residual between H(x) and x, i.e. F(x)=H(x)−x. They hypothesized that it will be easier to learn the residual function F(x) than to learn H(x). In the extreme case that the unit should simply let the signal pass-through it (i.e. H(x)=x is the optimal thing to learn), it would be easier to push the residual F(x) to zero than to learn H(x).

Experiments on deep nets proved that this hypothesis was indeed true — if learning to let the signal pass-through was the optimal thing to do (i.e. reduced the loss), the units learnt F(x)=0; but if something useful was to be learnt, the units learnt that. These units are called residual units.

Let’s return to our project . We have the following input for model building process: one image channel, two different classes like ‘effusion’ vs ‘nofinding,’ and 256 rows and columns.

img_channels = 1
img_rows = 256
img_cols = 256

nb_classes = 2

We have used data-generation process to load only what is needed to ensure we don’t exceed memory space limits.

The basic steps to build the model are:

  1. Import resnet, as we will use transfer learning.
  2. Run the augmented data generator.
  3. Perform an ablation Run.
  4. Fit the model.

CXR:Train Model

Now, it’s time to train the model. The first step of model-training is to conduct an ablation experiment. It is done to check whether code is working as expected. We try to run it with a small dataset and less number of epoch to identify any potential problem.

We can understand a few facts from the first ablation run. The data class is highly imbalanced. The ratio of ‘effusion’ vs ‘nofinding’ is almost 10 (107/1000). As most of the data belongs to only one class, simply training in this scenario will not work as the model will mostly learn and classify most of the data as ‘no-finding’ resulting in high accuracy. If you notice, around 90 per cent (1000/1107) of the data is ‘no-finding.’ By classifying all the data as the same, the accuracy will be 90 per cent which is close to 96 percent accuracy we achieved. So, the objective to correctly classify the ‘effusion’ is not fulfilled. The high accuracy clearly misleads us and therefore we will use Area Under Curve(AUC) to validate the result.

CXR: Train Model with AUC

What is Area Under Curve(AUC ) ?

AUC — ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes. Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s. By analogy, Higher the AUC, better the model is at distinguishing between patients with disease and no disease.

AUC vs Accuracy

Moment of Truth

We observed that Accuracy is high, approximately .93 to .96. Conversely, AUC is much lower at .29. Because AUC is lower than 0.5, the model is working in reverse order, identifying positive classes as negative classes.

Our observations from the second ablation run are:

  1. The model is not performing very well on our chosen measure of AUC.
  2. The primary reason for this is the prevalence problem. There are just not as many abnormal cases available in the dataset. This problem will occur in almost all medical imaging problems, and for that matter, in most datasets that have a class imbalance.
  3. To tackle this problem, we introduced ‘weighted categorical cross-entropy’. This is a measure of loss, which applies weights to different forms of errors.

Solution for low Prevalence rate

A common solution to the low prevalence rate problem is using a weighted cross-entropy loss. The loss is modified such that misclassifications of the low-prevalence class are penalised more heavily than the other class.

Therefore, every time the model makes an error on the abnormal class (in this case, ‘effusion’), we penalise it heavily by multiplying the loss by a high value of weights. This results in an increase in loss for misclassified classes and therefore the change in weights due to backpropagation is more. So, the learning curve for the weights responsible for misclassification is more.

Let’s say “no finding” is class 0 and “effusion” is class 1.

bin_weights[0,0]: Actual class: 0, Predicted class: 0, so no penalty, just normal weight of 1.

bin_weights[1,1]: Actual class: 1, Predicted class: 1, so no penalty, just normal weight of 1.

In case of abnormality:

bin_weights[1,0] — Actual class is 1, Predicted class is 0, penalise by weight of 5.

bin_weights[0,1] — Actual class is 0, Predicted class is 1, penalise by weight of 5.

def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
cross_ent = K.categorical_crossentropy(y_true, y_pred, from_logits=False)
return cross_ent * final_mask

bin_weights = np.ones((2,2))
bin_weights[0, 1] = 5
bin_weights[1, 0] = 5
ncce = partial(w_categorical_crossentropy, weights=bin_weights)
ncce.__name__ =’w_categorical_crossentropy’

CXR: Final Run

We identified a root cause for the problem and proposed a specific solution — weighted categorical cross-entropy and strat our Final Run.

We save the model weights only when there is an increase in ‘validation accuracy’. The method to save the model weights is called checkpointing and it is called using keras callback. The results clearly demonstrate that the AUC has significantly increased from 0.19 (without weights) to 0.86(with weights).

Making a Prediction

We have used our model to predict a X-ray image is ‘effusion’ or ‘nofinding.’

We have used an untouched (This image was not used in train or validation dataset)Test image for prediction.

Test image looks like above

We reload out Resnet and load weights from best saved weights during final run step above.

Prediction of the Test image datapoint

So according to model test image is an effusion .

Conclusion and Next steps

I’ve only just started using CNN on Medical images .Going forward I’ll be extending the model to predict more output and I will try to improve accuracy .You’re welcome to join me on the project Github repo.I have uploaded my training data in google Drive.

Let’s build together or try it out yourself and beat me to it. Either way, keep learning and improving yourself. Until next time, stay sharp!