Source: Deep Learning on Medium
DETECTION AND QUANTIFICATION OF MALARIA PARASITE USING NEURAL NETWORKS
MY LEARNING ARCHIVES FOR INDABAX 2019 PROJECT
These are the building processes of my malaria detection and severity level assessment .This documentation comprises of every detail and procedure taken in a bid to achieving this great fit. It is simplified and detailed so much so that even a novice in machine learning can follow on with, understand and even make additional upgrades to the model. This project is an open sourced project so every additional improvement is welcome. Below is the link to the full project document.
The convolutional implementation procedure goes through four(4) stages:
2. MAX POOLING
4. FULL CONNECTION
All these would be well explained as we build and implement the project.
We are using the KERAS and TENSORFLOW packages and libraries to carry out our convolutions. We started by importing the necessary libraries as follows:
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
It should be worthy of note that our malaria image cell dataset been pre-processed as we have carefully split our images into infected and uninfected classes. Furthermore, we did a split into the 80/20 train and test set for training, validation and accuracy measuring.
We use convolutional networks to classify images and other 2D datasets. The following are the major steps after our data pre-processing we carry out in achieving our convoluted results:
STEP 1 CONVOLUTION:
This involves using different filters/feature detectors to capture different features that would be used to classify our image dataset. This would give birth to many created feature maps as much as our feature detectors are in number, hence, feature detectors= feature maps.
Note that the feature map detects by pixels.
CONVOLUTION IMAGE COMES IN HERE….CHECK GOOGLE
The result of Convolution is not only feature detection but also feature preservation. This it does by displacing all other unnecessary features and keeping the ones that matters to its learning process. Hence, the original image is reduced in size from say a 7×7 matrix to a 5×5 matrix. This, the network does in order to focus its vision on necessary features on the image. All the created feature maps are rested in a convolution layer and then progressed to the next stage. We created many feature maps in other to obtain our first convolution layer. We created multiple feature maps because we use different filters (feature detectors).
#initializing the CNN
classifier=Sequential()……………… here we created an object of the keras Sequential function
The convolution network uses a sequential function in keras library to create our convolution layers using the syntax classifier.add(Convolution2D(32,3,3, input_shape=(64,64,3), activation=’relu’)) this is where we specify the number of filters(feature detector) we want to apply because there would be one feature map created for each filter used. Since it is an ideal practice to start with 32 feature maps, we started out with that and then we defined the no. of rows and columns for our feature detector (3 x 3). Also, we integrated our input shape as parsing first the dimensions of our 2D arrays as 64×64 :though small, we had to start out that small considering that this project is been built and tested on a CPU as its development environment but as we progress, we would test it on higher dimensions. The ‘3’ parsed in our input_shape function is significant of the 3D input images we are parsing into our CNN. Hence, its colors RGB are preserved. Note that we are using tensorflow backend that is why the input_shape parameters came in that order else, if you are using the TNL backend, it must come in the reverse order. Next is the activation function. The activation function we are using in our CNN is the rectifier function called RELU: this is to remove negative pixels and strengthen non-linearity in our classification because classifying images is a non-linear problem. Now our convolutional layer is ready to be added to our CNN, so, just select the whole line of code we just made and run it! Done! Convolutional layer added! . Now we have taken care of the convolution step, next is the pooling step.
STEP 2 MAX POOLING
The pooling step is quite easy; it just consists of reducing the size of your feature maps created from the convolutional phase. So, what happens here is that a 2×2 empty celled matrix is moved all over our feature map(s) thereby taking the maximum with the 4 figures on the feature map it rests on as it moves its way around the feature map(s). So, per move within the feature map, it takes the maximum number and fills it in another empty 3×3(or otherwise as the case may be) matrix called POOLED FEATURE MAP .This is what is given birth to after each pooling on each feature map is applied. Therefore, we have same number of pooled feature maps as we did feature maps after the initial convolution process. The only thing that changed is the size of the feature map. Also, the max pooling gives a spatial invariance to the feature maps so that whenever an image is parsed having any position or inclination, it would still detect it to be same image. It is worthy of note also that we slide the square table representing our convolved feature map with a slide of two not a slide of one. That is to say that the empty 3×3 matrix moves round the convolved feature map by jumping every two columns it had previously completely rested on. So, after applying max pooling on all our feature maps , we now have a new layer composed of all these reduced feature maps and that layer is called the POOLING LAYER. Now, one major reason why we do max pooling is so that that the size of the feature maps are reduced thereby reducing the number of nodes in our fully connected layers in the neural network hence, complexity and time of computation is reduced but performance is maintained. If the feature maps are not reduced, we would have a lot of nodes to deal with hence increased complexity and computational time. This in turn affects the scalability of our model. Now we go to the implementation :
This is the line of code we used to implement our pooling on our convolved feature maps. If you notice, we have pool_size() as the parameter we parsed into the ‘MaxaPooling2D’ function, yes ! it is the argument we use to state the size of our matrix that moves over a feature map in two slides. The 2×2 is indicative that we used a 2×2 dimensional matrix for our project. So, lets run it!
STEP 3 — — — FLATTENING
The flattening step is all about taking all our pooled feature maps and fixing them into a single 1-D vector, which would eventually serve as the input layer of our ANN in the next phase. This of course is going to be a huge vector because even if we reduced the size of the feature maps and divided them by 2, we would still have many pooled feature maps left to pass into our ANN input layer but nonetheless, the good part of the convolution and the pooling we did is that specific and special features of our input image(s) were preserved and stored as high numbers in our pooled feature maps that are then fitted into the ANN input layer. Each stored or preserved feature of the input image is fitted into each of the nodes in our input layer. Let’s go ahead and implement that in our project with the following line of code but this time it is not receiving any parameter.
Now the huge 1-D single vector is created, which now holds all the special features in its nodes. Now what we have left is to create a classic ANN that would classify these input images according to the features it has learnt from.
STEP 4 — — — FULL CONNECTION
The full connection phase consists of making a classic ANN comprising of fully connected layers. We would be using the 1-D vector as our input layer for the ANN. We use the ANN here because the ANN is a great classifier for non-linear problems and since we are dealing with a non-linear classification problem here, it then comes in handy. It would do a perfect job here to classify and tell us if each image is infected or not. And right now, since we already have our input layer, the only thing that we have to do left is to create a hidden layer called FULLY CONNECTED LAYER and then we would need to create an OUTPUT layer which is an ideal practice for initializing ANN. The output layer is expected to give us a binary probabilistic outcome hence our need to eventually include the sigmoid function in its call. The following line of Code creates a hidden layer:
classifier.add(Dense(output_dim=128, activation=’relu’)) #creates the hidden layer i.e the fully connected layer
classifier.add(Dense(output_dim= 1, activation=’sigmoid’)) # creates the output layer
As seen above, the classifier object receives an argument called ‘Dense ’ ,This is the function we use to create the fully connected layer . The dense function expects a parameter called “output_dim” ,this denotes the number of nodes in the hidden(fully connected layer) and off course there is no rule on how many nodes to assign your hidden layer but you don’t want it to be too small considering the number of nodes in your input layer neither do you want it to be too large considering computation .So, common practices recommend number of nodes around 100 or above or a little below. Following is the activation function been used. We used the rectifier function “relu” for the sake of increasing non-linearity. Note, we don’t use sigmoid here.
Next is the output layer ,since we are expecting either infected or not i.e one outcome, we set our number of nodes as one except for cases where you are expecting more than one result or outcome and also we use the sigmoid function as we are expecting a probabilistic outcome.
Now , we are done with all 4 steps and what we now need to do is to compile the whole model.
COMPILING THE WHOLE MODEL
The following code line compiles our model:
classifier.compile(optimizer=’adam’, loss=’binary_crossentropy’, metrics=[‘accuracy’])
Still engaging the classifier object, we use a compile function this time to parse in parameters
That help compile our model especially the optimizer which initializes our stochastic gradient
descent called ‘adam’ .
WE MOVE TO OUR DATA PRE-PROCESSING
PART 2 FITTING OUR CNN TO OUR IMAGES
We are going to be doing an IMAGE AUGUMENTATION process on our data to avoid overfitting. Overfitting here occurs when you have great accuracy on your training set but poor accuracy on your test set. The image augmentation is done from a Keras library.
We were able to fit our images to the model and training accuracy was while test accuracy was
Hence, we may need an additional tweaking to give us our desired accuracy.
This Tweaking is to add an additional convolutional layer and pooling layer, which we did, and it increased the accuracy by …….Playing around this deeper neural network gave us a lot more accuracy on the training and test set. So, when tweaking, you could either add an additional convoluted layer(with a following pooling layer) or an additional fully connected layer or you insert both. We Play around these and we see which one gives us a better accuracy..
NOW OUR CNN ON MALARIA DETECTION IS READY WITH ACCURACY OF ………..
SO, it took my PC, AN 8GB HARD DISK 4HRS TO TRAIN A CNN MODEL ON 10,000 IMAGES(8000 TRAIN AND 2000 TEST)……whoop! And we are good to go!!!
And so we used the 1st 4000 cell images of the parasitized section for our training test while we used the next 1000 parasitized cell images for the test set…….and same order for the unparatized.
So, I started train-testing the CNN model by 9:30pm and ended by 11:45pm approximately 2 hours 15mins.
This training was done without and extra convolutional and pooling layer.
So, for 8,000 train set images and 2,000 test set images in a computational time of 2hrs 15mins , I got a training accuracy of 95% and a test accuracy of 92%…….WHOOP!!!
THE SECOND TRAINING WAS DONE WITH 26,000 IMAGES: 20,000 TRAIN SET AND 6,000 TEST SET ACCORDINGLY AND THIS WAS DONE WITHOUT AN EXTRA POOLING AND CONVOLUTIONAL LAYERS. THE TRAINING STARTED BY 8:55PM AND ENDED 11:15am the next day THEREBY TAKING ABOUT 14HRS OF COMPUTATIONAL TIME. IDEALLY, IT SHOULD HAVE TAKEN ABOUT 5HRS AND 30MINUTES BUT DUE TO FREQUENT POWER OUTAGE IN THIS PART OF THE WORLD, IT TOOK WAY LONGER THAN NORMAL. WE GOT AN ACCURACY OF 95% IN THE TRAIN SET AND 92% IN THE TEST SET.
Next, I added an extra convolutional layer and a pooling layer. The computation started by 2:32pm and ended by 11:10pm and extended, again due to frequent power outages. We had a training accuracy of 96% and test accuracy of 95%. Suggestively, this did not indicate any tendencies of overfitting as the margins between our train and test accuracy are minimal.
An untrained perssonel can just take hs/her smartphone and and snap through the microscope and run it against your model installed as an app on the microscope.THIS CAN also help for an improved medical experience even for the trained personnel.
NB: chuka remember your error on test went lowest at the 13th epoch hence, you are expected to experiment a training on your NN on just 13 epoch to see how it behaves.
Different regularization techniques in deep learning :
1. L2 and L1 regularization
3. Data Augmentation
4. Early Stopping