Applying Faster R-CNN for Object Detection on Malaria Cells

Source: Deep Learning on Medium

1. Context

Malaria is a disease caused by Plasmodium parasites that remains a major threat in global health, affecting 200 million people and causing 400,000 deaths a year. The main species of malaria that affect humans are Plasmodium falciparum and Plasmodium vivax.

For malaria as well as other microbial infections, manual inspection of thick and thin blood smears by trained microscopists remains the gold standard for parasite detection and stage determination because of its low reagent and instrument cost and high flexibility. Despite manual inspection being extremely low throughput and susceptible to human bias, automatic counting software remains largely unused because of the wide range of variations in brightfield microscopy images. However, a robust automatic counting and cell classification solution would provide enormous benefits due to faster and more accurate quantitative results without human variability; researchers and medical professionals could better characterize stage-specific drug targets and better quantify patient reactions to drugs.

Previous attempts to automate the process of identifying and quantifying malaria have not gained major traction partly due to difficulty of replication, comparison, and extension. Authors also rarely make their image sets available, which precludes replication of results and assessment of potential improvements. The lack of a standard set of images nor standard set of metrics used to report results has impeded the field.

2. About Images & Objective

Images are in .png or .jpg format. There are 3 sets of images consisting of 1364 images (~80,000 cells) with different researchers having prepared each one: from Brazil (Stefanie Lopes), from Southeast Asia (Benoit Malleret), and time course (Gabriel Rangel). Blood smears were stained with Giemsa reagent.

Labels: The data consists of two classes of uninfected cells (RBCs and leukocytes) and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts). Annotators were permitted to mark some cells as difficult if not clearly in one of the cell classes. The data had a heavy imbalance towards uninfected RBCs versus uninfected leukocytes and infected cells, making up over 95% of all cells.

A class label and set of bounding box coordinates were given for each cell. For all data sets, infected cells were given a class label by Stefanie Lopes, malaria researcher at the Dr. Heitor Vieira Dourado Tropical Medicine Foundation hospital, indicating stage of development or marked as difficult.

Objective : The objective of this project is to train a Faster R-CNN object detection technique on Malaria Cells and classify the objects in each image to the cell classes mentioned above with maximum possible precision using the idea in the research paper

3. Evaluation Metric

Mean Average Precision : This is a popular metric in measuring the accuracy of Object detection techniques like Faster R-CNN, SSD..etc. MAP penalizes when an object is misclassifed or not detected.

4. Train & Test Data

  1. Images for train and test data are provided in .jpg and .png format in an Image folder.
  2. JSON files are provided for train and test images which consists the below columns.

Columns in both train and test

a. Checksum — ID

b. pathame — path of the image

c. ShapeR — shapeR of the image

d. ShapeC — ShapeC of the image

e. channels — channels of the image

f. bbminR — bounding box co-ordinates minR

g. bbminC — bounding box co-ordinates minC

h. bbmaxR — bounding box co-ordinates maxR

i. bbmaxC — bounding box co-ordinates maxC

j. category — class of the objects in the image

5. Libraries Used

  1. Pandas : Pandas is a software library in python language for data programming and analysis. Using pandas library we read the train and test csv files into pandas data frames and any data operations.
  2. JSON : The json library can parse JSON from strings or files. The library parses JSON into a Python dictionary or list. It can also convert Python dictionaries or lists into JSON strings. In this project, train and test files are in JSON format. Using the JSON library, the JSON files are converted to csv files.
  3. Imagesize : It parses image files’ header and return image size. Using Imagesize library, we can read the images to retrieve the height and width of the images.
  4. Keras : Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow. Keras contains numerous implementations of commonly used neural-network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with image and text data easier to simplify the coding necessary for writing Deep Neural Network code.
  5. Pickle : Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. We use pickle to save the trained model in this project.

6. Image Analysis

  1. Train and Test data for images which is in JSON format should be written into csv file to analyse further. Using the below code it can be read and saved as a csv file.
Train CSV

2. Train data has 7 labels and test data has 6 labels. Data is highly imbalanced. Red blood cells are very high in count in both train and test images.

Label counts

3. Bounding box co-ordinates and their labels are provided for each object in all the images. With these bounding boxes, we can see how a train and test image looks like.

Train Image
Test Image

4. Input format for training a Faster R-CNN technique will be an annotate text file which will have the image filename, bounding box co-ordinates and its category values. Train and test annotate text files can be created using the below code.

csv to text file
Annotation File

5. We can now proceed to build the Faster R-CNN model and train on the Malaria Images.

7. Applying Faster R-CNN

Faster R-CNN
  1. Base Model : Faster R-CNN uses a base CNN model like VGG-16, ResNet..etc to extract features of the images. In this project VGG-16 model trained on ImageNet dataset is used as the base model to extract features. As Faster R-CNN uses a separate classification layer for classifying the objects with their labels, the output of an intermediate convolutional layer is used to pass it to the next layer.

2. Anchors : These are fixed bounding boxes which are placed throughout the image with different sizes and aspect ratios that are going to be used for reference when first predicting object locations. As the objects in each malaria image are very small, the anchor box scales is a hyperperameter which can be adjusted to create anchor boxes through the image. Anchor box scales used for these images is [8,16,32].

Positive Anchors

3. Region Proposal Network : RPN takes all the reference boxes (anchors) and put them into two different categories. Those that overlap a ground-truth object with an Intersection over Union (IoU) bigger than 0.5 are considered “foreground” and those that don’t overlap any ground truth object or have less than 0.1 IoU with ground-truth objects are considered “background” and outputs a set of good proposals for objects.

The RPN is implemented efficiently in a fully convolutional way, using the convolutional feature map returned by the base network as an input. First, we use a convolutional layer with 512 channels and 3×3 kernel size and then we have two parallel convolutional layers using a 1×11x1 kernel, whose number of channels depends on the number of anchors per point.For the classification layer, we output two predictions per anchor: the score of it being background (not an object) and the score of it being foreground (an actual object).

For the regression, or bounding box adjustment layer, we output 4 predictions: the deltas \Delta_{x_{center}}, \Delta_{y_{center}}, \Delta_{width}, \Delta_{height}Δ​xcenter​​​​,Δ​ycenter​​​​,Δ​width​​,Δ​height​​ which we will apply to the anchors to get the final proposals.

Using the final proposal coordinates and their “objectness” score we then have a good set of proposals for objects.

4. Non-max suppression : Since anchors usually overlap, proposals end up also overlapping over the same object. To solve the issue of duplicate proposals we use a simple algorithmic approach called Non-Maximum Suppression (NMS). NMS takes the list of proposals sorted by score and iterates over the sorted list, discarding those proposals that have an IoU larger than the threshold of 0.9 with a proposal that has a higher score. Top 300 proposals are considered after applying NMS.

5. Region Of Interest Pooling : Now we have a bunch of object proposals with no class assigned to them. In this step, we take each proposal, crop it, and then pass it to the final classification R-CNN layer to label the proposals.

A simpler method, which is widely used by object detection implementations, including Luminoth’s Faster R-CNN, is to crop the convolutional feature map using each proposal and then resize each crop to a fixed sized 14×14×convdepth using interpolation (usually bilinear). After cropping, max pooling with a 2×2 kernel is used to get a final 7×7×convdepth feature map for each proposal.

The reason for choosing those exact shapes is related to how it is used next by the next block (R-CNN). It is important to understand that those are customizable depending on the second stage use.


6. Region-Based Convolutional Neural Network : Region-based Convolutional Neural Network

Region-based convolutional neural network (R-CNN) is the final step in Faster R-CNN’s pipeline. After getting a convolutional feature map from the image, using it to get object proposals with the RPN and finally extracting features for each of those proposals (via RoI Pooling), we finally need to use these features for classification. R-CNN tries to mimic the final stages of classification CNNs where a fully-connected layer is used to output a score for each possible object class.

R-CNN has two different goals:

  1. Classify proposals one of the classes, plus a background class (for removing bad proposals).
  2. Better adjust the bounding box for the proposal according to the predicted class.

7. Training : Once the model is ready, we start training by passing the train_annotation.txt file, initialising the base model weights and passing the number of classes to the model. Accuracy is the metric used while training the model. Trained for 100 epochs where each epoch took around 2 hours and achieved an accuracy of 93%. There are 4 different losses recorded, two for RPN layer and two for R-CNN layer. This model is trained using Stochastic Gradient Descent which resulted in

Classifier accuracy for bounding boxes from RPN: 0.934
Loss RPN classifier: 0.08543467678808205
Loss RPN regression: 0.02398829758935608
Loss Detector classifier: 0.1817812535882481
Loss Detector regression: 0.03701828640169697
Total loss: 0.3282225143673832

8. Test Data Evaluation

The evaluation is done using the standard Mean Average Precision (mAP) at a specific IoU threshold of 0.7. Using the trained model which is saved in the previous step, predicted the bounding boxes for test images by passing the test annotation file and its class labels.

Calculated the precision on all the 120 test images to see the prediction of malaria cells in each image.

Elapsed time = 3.3739821910858154
red blood cell AP: 0.9098086551867395
gametocyte AP: 0.7333333333333333
trophozoite AP: 0.7837234722362358
ring AP: 0.7036900726267535
schizont AP: 0.44
difficult AP: 0.056179775280898875
mAP = 0.6044558847773268
Elapsed time = 3.3699944019317627
red blood cell AP: 0.9095687056598919
gametocyte AP: 0.7333333333333333
trophozoite AP: 0.7884311726729429
ring AP: 0.7036900726267535
schizont AP: 0.4230769230769231
difficult AP: 0.053763440860215055
mAP = 0.60197727470501
Elapsed time = 3.3679988384246826
red blood cell AP: 0.9098503831773301
gametocyte AP: 0.7333333333333333
trophozoite AP: 0.7884311726729429
ring AP: 0.7068164731280739
schizont AP: 0.4230769230769231
difficult AP: 0.053763440860215055
mAP = 0.6025452877081364
Elapsed time = 3.742985725402832
red blood cell AP: 0.9101059579135993
gametocyte AP: 0.7333333333333333
trophozoite AP: 0.7884311726729429
ring AP: 0.7135207354597167
schizont AP: 0.4230769230769231
difficult AP: 0.053763440860215055
mAP = 0.6037052605527884
Elapsed time = 3.854698419570923
red blood cell AP: 0.9101823806717539
gametocyte AP: 0.7333333333333333
trophozoite AP: 0.7884311726729429
ring AP: 0.7146642333156751
schizont AP: 0.4230769230769231
difficult AP: 0.053763440860215055
mAP = 0.6039085806551406
Elapsed time = 3.3181307315826416
red blood cell AP: 0.9095249366698459
gametocyte AP: 0.75
trophozoite AP: 0.7899554367201427
ring AP: 0.7146642333156751
schizont AP: 0.4230769230769231
difficult AP: 0.05319148936170213
mAP = 0.6067355031907148
Elapsed time = 3.3460564613342285
red blood cell AP: 0.9097258074309477
gametocyte AP: 0.7058823529411765
trophozoite AP: 0.7914580101784912
ring AP: 0.7169115123945246
schizont AP: 0.4230769230769231
difficult AP: 0.05319148936170213
mAP = 0.6000410158972942
Elapsed time = 3.3121471405029297
red blood cell AP: 0.9099155549306795
gametocyte AP: 0.7058823529411765
trophozoite AP: 0.7914580101784912
ring AP: 0.7186625010102929
schizont AP: 0.4230769230769231
difficult AP: 0.05263157894736842
mAP = 0.6002711535141553
Elapsed time = 3.4268405437469482
red blood cell AP: 0.9094612324851495
gametocyte AP: 0.7058823529411765
trophozoite AP: 0.7858513244702504
ring AP: 0.7220951977234226
schizont AP: 0.4230769230769231
difficult AP: 0.05263157894736842
mAP = 0.5998331016073818

Mean Average Precision achieved is around 60% for all the test images using the trained model.

9. References

  1. GitHub code repository :
  2. Reference Links :

Thank you