Image Classification Using Deep-Learning — ALL (Acute Lymphoblastic Leukemia) Detection

Source: Deep Learning on Medium

Image Classification Using Deep-Learning — ALL (Acute Lymphoblastic Leukemia) Detection

1. Introduction

We are going to discuss image classification using deep learning in this article. This article is going to discuss image classification using a deep learning model called Convolutional Neural Network(CNN). Before we get into the CNN code, I would like to spend time in explaining the architecture of the CNN. This project is done as part of final year.

2. Architecture

The Regular Neural Netowrks(NN) is not capable of dealing with images. Just imagine each pixel is connected to one neuron and there will thousands of neurons which will be computationally expensive. CNN handles images in different ways, but still it follows the general concept of NN.

They are made up of neurons that have learnable weights and biases. Each neuron accepts the inputs, action a dot product operation and follows the non-linearity function. And they still have a loss function (e.g. SVM/Softmax) on the fully-connected layer and all the tips & tricks we developed for learning regular NN still apply.

CNN is used as the default model for anything to deal with images. Nowadays there are papers that has mentioned about the use of Recurrent Neural Network(RNN) for the image recognition. Traditionally RNNs are being used for text and speech recognition.

Use of CNN helps to reduce the number of parameter required for images over the regular NN. Also, It helps to do the parameter sharing so that it can possess translation invariance.


Let’s dive into how convolution layer works. Convolution has got set learn-able filters which will be a matrix(width, height, and depth). We consider an image as a matrix and filter will be sliding through the image matrix as shown below to get the convoluted image which is the filtered image of the actual image.

Depend upon the task, more than one filter is available in the model to cater the different features. Feature might be looking for a ALL Cell, looking for shape etc. Filter matrix value is learned during the training phase of the model.

Padding is the another important factor in convolution. If you apply filter on the input image, we will be getting output matrix with less size than the original image. Padding comes into play, if we need to get the same size to output as input size.

Activation function

Activation functions are functions that decide, given the inputs into the node, what should be the node’s output? Because it’s the activation function that decides the actual output, we often refer to the outputs of a layer as its “activations”. One of the popular activation function in cnn is ReLu.

Max pooling

Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non linear functions that can do pooling, in which, max pooling is the most common one.

According to wikipedia, It partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum. The intuition is that the exact location of a feature is less important than its rough location relative to other features.


The pooling layer helps to reduce the spatial size of the representation, to reduce the number of parameters and amount of computation in the network. This also controls the over-fitting.

Fully connected

Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular neural networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset.(Wikipedia)


Softmax function calculates the probabilities distribution of the event over ’n’ different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes. Later the calculated probabilities will be helpful for determining the target class for the given inputs.

Mathematically the softmax function is shown below, where z is a vector of the inputs to the output layer (if you have 10 output units, then there are 10 elements in z). And again, j indexes the output units.

Image classification

I will explain the code base for my project that i have done throughout my final year.

Download dataset

Since my project is based on ALL detection i,e we are going to classify ALL images from Normal cell images therefore i am going to use wbc nuclei images. The dataset was released as a part of ISBI challenge on codalab’s website the dataset consists of 10,661 wbc nuclei images in which both the catagories are present i,e Normal cells and ALL cells.

The image size is of 450*450


Data preparation and pre-processing

We organized our data in two folders with the folder names serving as the label for the images in them.

insert data folder snip here

Now we split our data into two sets with a split ratio of 8:2

Training set — 8,480 images

Testing set — 2,120 images

To get a good split, it is important to pick images randomly without any bias. Instead of doing this manually we built a custom python script which takes as input the dataset path, the output path, split ratio and optionally a random seed value. The script builds the train and test image datasets for all image categories given a split ratio randomly by first shuffling all the images randomly then splitting them into two halves.


The borders from the images were cropped out to the maximum possible amount while making sure not to remove any part from the nucleus image from any of the images. The images were cropped from their original size of 450 ∗ 450 to a new size of 400 ∗ 400. This was done so as to retain as much information as possible during the image resize, which was necessary step owing to the resource constraints of our system

cropped version


To increase the amount of training data for better classification and also to avoid overfitting, so that at training time, model will never see the exact same picture twice. We performed augmentation on our training dataset by applying random rotations and flips to them. This was done by using the “Augmentor” python library available on github. Using augmentor we increased the number of training images from 8,480 to 20,000.

normal image
Augmented Images

Build image classification models

Owing to the unsatisfactory results from the available DL models, we decided to develop our own model for the classification task. Our model consists of 17 layers, of which 14 ar e convolutional layers — divided into 6 blocks.


• 14 {convolution + relu} modules

• 5 {Max pooling} modules

• 3 {fully connected + fully connected} modules

• 2 {dropout} module

• 1 {softmax} module

Convolutions operate on 3×3 windows and max pooling layers operate on 2×2 windows.

First convolution block extracts 32 filters, second convolution block extracts 64 filters, third convolution block extracts 128 filters, fourth convolution block extracts 256 filters, fifth convolution block extracts 384 filters and the last block extracts 512 filters.

The model is built using the following layers:

• Conv2D — 2D Convolutional Network

• Activation — Relu

• Batch Normalization

• Max pooling

• Dense — Fully Connected Neural Network

Model Architecture

Network Training

The model is compiled with training parameters set. The Activation function used for hidden layers is Non-Linear activation function “Relu”. The activation function used for the output layer is Probabilistic activation function “Softmax”.

During training, SGD optimizer performed the best at a learning rate of 0. 0002.The model ran for 100 epochs with the batch size of 30.


The performance metric selected for model evaluation was accuracy. There are four possible outcomes of a classifier: true positives (when cancerous cells are correctly classified), true negatives (when non-cancerous cells are correctly identified), false positives (when non-cancerous cells are classified as cancerous) and false negatives (when cancerous cells are classified as non-cancerous).

Callbacks were chosen to monitor the validation accuracy and save the model at best possible parametric weights. Considering the simplicity of the model architecture, an accuracy rate of 93.02% was achieved.

Table 2 shows the performance comparison of the various different models and our model.

Architecture Accuracy(%)

AlexNet 81.62

VGG_16 84.8

ResNet50 84.77

InceptionV3 83.67

Proposed Model 93.02