Performance analysis of a CNN object detector for blood cell detection and counting

Original article was published on Deep Learning on Medium

Performance analysis of a CNN object detector for blood cell detection and counting


Counting the number of white blood cells (WBCs), red blood cells (RBCs), and platelet in the blood is an important test for evaluating the health conditions. Lack or excess of any one of these can be a sign of a disease and can lead to more severe health problems in the future. For instance, WBCs are protecting the body against different diseases and external invaders. These cells occupy around 1% of the volume of the blood in the body. An increase in the number of WBCs in the blood indicates a health problem such as infection, allergy, leukemia, and many other types of problems. RBCs are oxygen carriers in the body and make up around 45% of the cells in the blood. Insufficient number of these cells in the body causes anemia. Platelets are responsible for creating clots to prevent bleeding.

Several methods have been proposed in the past for detection and counting the blood cells from microscopic images. [1] does the following main steps for detecting the red blood cells: histogram equalization, contrast and brightness adjustment, edge detection, dilation, gap filling, erosion, cell extraction, border padding, centroid filling, identifying WBCs based on their color, and using a neural network to count the number of RBCs. [2] counts the RBCs from sample images using noise removal, red plane extraction, edge detection, morphological filling and circular hough transform. [3] uses watershed segmentation for detecting RBCs and [4] uses Hough Transform. [5] utilizes YOLO object detection algorithm for detecting RBCs, WBCs, and Platelets. [6] uses reginal convolutional neural networks to detect and classify different types of WBCs. [7] implements CycleGAN to improve the blood cell image dataset.

The recent methods focus more on deep convolutional networks (CNNs) for detection and counting the number of different cells. CNNs are trained end to end with labeled data, and the convolution filters are automatically tuned to extract the specific features in the images. For the object detection task these networks are trained on an annotated dataset to predict the location and the type of the objects. Object detection algorithms are classified into two-stage and one-stage detectors. Faster-RCNN is one of the most popular two-stage detectors, and YOLO, SSD, and RetinaNet are examples of one-stage detectors. One-stage detectors are faster in comparison to two stage detectors. Among them, RetinaNet has shown better performance in terms of accuracy and speed. Therefore, I utilize this detector for the detection of the blood cells. This algorithm uses feature pyramid network (FPN) which can extract features of the images at different scales. FPN is implemented on top of a backbone convolution neural network such as Resent50. [8] The network has one output head for regression and another head for classification at every level of the feature pyramid. This network uses a specific loss function which can handle imbalanced data better. Particularly for the WBCs and platelets that are rare in compare to RBCs, the focal loss improves the network training.

I train this network on the blood cell count and detection (BCCD) dataset which is an open source dataset available online. [13] The trained model, was evaluated on the test dataset based on the popular metrics.

Materials and Method


Blood Cell Count and Detection (BCCD) dataset was used for all the experiments. This dataset is publicly available at [13] and contains 364 annotated images of the size 640×480 pixels. I split the dataset into 3 sets. 250 images for the training, 50 for validation and 64 for the final test of the trained model.


RetinaNet was proposed in 2018 and outperformed other methods on the COCO dataset. This method achieves its high accuracy because of the use of the focal loss which takes into account the class imbalance of the dataset. It has two sub-networks in its head for regression of the bounding boxes and classification. For bounding box predictions, it uses anchor boxes with different sizes and aspect ratios. Anchors are assigned to ground-truth objects based on the intersection-over-union (IoU) metric. If the IoU is higher than 0.5 the anchor is assigned to the object and if the the IoU is lower than 0.4, it is assigned to the background. Anchors with IoU [0.4,0.5) are neglected. RetinaNet uses feature pyramid network (FPN) for its backbone to extract features at different resolutions. The FPN is built on top of a CNN such as ResNet50. Resnets use skip connections which help in training deeper networks without losing accuracy. At each level, the predictions with confidences less than 0.05 are eliminated and the top 1000 predictions are kept. Finally, non-maximum-suppression is applied to all the predictions with the threshold of 0.5.


I evaluated the performance of our trained models based on the following metrics.

A detection is true-positive (TP) if the confidence and IoU of the predicted box is higher than the prespecified thresholds. If the detector can not detect a ground truth object, it is considered false-negative (FN), and when the detector wrongly makes a detection, it is a false-positive (FP) detection.

IoU is the intersection over union of the predicted and the ground truth areas, and it shows the amount of overlap between them. IoU closer to 1 means better localization performance of the model. Average precision (AP) is calculated by taking the average of precision at different recall values. The mean of this metric for several classes is called mean average precision (mAP).


The loss function for training is the sum of for regression and the focal loss with its original parameters (γ=2 and α=0.25). I start the training from a model pretrained on COCO dataset and continue the training for our specific dataset. The learning rate was set at 1e-4. The images were fed into the network with the batch size of 1. The training continued until there was no improvements on the validation dataset. I used the code available at [14] for this project.


After the RetinaNet model was trained on the train set, I evaluated the performance of the model on the test set. The test set contains 64 images which include 813 instances of RBCs, 65 WBCs, and 57 Platelets. Figure 1 shows two images of the test set with the detection boxes drawn on them. As the images show, the model is able to correctly classify and localize almost all of the blood cells. The confidence score and IoU threshold for the model inference on these images were both set to 0.5.

Figure 1: detection results on two images of the test dataset using the trained model.

Figures 2–4 show the precision-recall curve for each type of cell, setting the IoU threshold at 0.5. The graphs show that the model has a better performance for detecting WBCs (AP=1). Tabel 1 and 2 show the recall and precision of the model for different confidence thresholds (with IoU threshold=0.5). The tables are followed by their corresponding graphs. By increasing the confidence threshold, the recall decreases while the precision increases, as it is expected. For WBCs, the recall doesn’t change much, because the model is always able to detect all the WBCs. However, by decreasing the threshold, the number of false-positives increases and therefore the precision of the model in detecting WBCs decreases. The higher performance of the model for WBCs is due to bigger size of these type of cells in compare to others. Table 3 shows the AP for each cell type at different IoU thresholds and figure 7 show that mAP decreseas as the IoU threshold increases.


I successfully trained a RetinaNet model for detection of different types of cells in the blood. The trained model had a very good mean average precision on the test set. Particularly for the white blood cells the model performed better. Platelets were harder to be detected by the model due to their smaller sizes, but still the model were able to detect them with a good accuracy.


  1. Poomcokrak, Jutarat, and Chatchai Neatpisarnvanit. “Red blood cells extraction and counting.” The 3rd International Symposium on Biomedical Engineering. 2008
  2. Dvanesh, Varun D., et al. “Blood cell count using digital image processing.” 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT). IEEE, 2018
  3. Chourasiya, Sumeet, and G. Usha Rani. “Automatic red blood cell counting using watershed segmentation.” Hemoglobin 14 (2014): 17
  4. Maitra, Mausumi, Rahul Kumar Gupta, and Manali Mukherjee. “Detection and counting of red blood cells in blood cell images using hough transform.” International journal of computer applications 53.16 (2012)
  5. Alam, Mohammad Mahmudul, and Mohammad Tariqul Islam. “Machine learning approach of automatic identification and counting of blood cells.” Healthcare technology letters 6.4 (2019): 103–108.
  6. Kutlu, Hüseyin, Engin Avci, and Fatih Özyurt. “White blood cells detection and classification based on regional convolutional neural networks.” Medical hypotheses 135 (2020): 109472.
  7. He, Jin, et al. “CycleGAN with an Improved Loss Function for Cell Detection Using Partly Labeled Images.” IEEE Journal of Biomedical and Health Informatics (2020).
  8. Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
  9. Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  10. Liu, Wei, et al. “Ssd: Single shot multibox detector.” European conference on computer vision. Springer, Cham, 2016.
  11. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  12. Lin, Tsung-Yi, et al. “Focal loss for dense object detection.” Proceedings of the IEEE international conference on computer vision. 2017.