Original article was published by Gaurav Rajpal on Deep Learning on Medium
IMPROVING DEEP LEARNING MODEL / NEURAL NETWORK.
Here we will be looking at the techniques which can be used to improve our deep learning model. We will be applying those techniques on our Emergency vs Non-Emergency classification problem and compare the performances.
- EARLY STOPPING.
Above figure represents the loss graph vs nos. of epochs. Here blue line shows training loss and orange lines shows validation loss. On x-axis we have epochs and on y-axis we loss.
If we consider nos. of epochs between 0–20 we see that both the training and validation loss are decreasing. Initially they both have high loss value and as the training progresses i.e. as the nos. of epochs increases their loss value decreases. Also the amount by which they are reducing is similar. So, we can say that they are in sink. With increasing in the nos. of epochs beyond 20 we observe that training loss is decreasing whereas the validation loss seems to be either increasing or remain same. This portion where training loss is decreasing and validation loss is either increasing or remaining constant is called as Overfitting. In this scenario, model seems to memorize the training data and avoids learning signals from the data. We don’t want this situation to happen. If we achieve some method it will help to perform well on validation data or unseen data.
To solve the above problem what we can do is to stop the training of the model at a point where it starts to overfit or memorize the training data and this technique is called as Early-Stopping. This method will help to generalize the model well on unseen data. By applying early-stopping, we can stop the training of the model once the validation loss starts decreasing .
By using early-stopping we need to keep a track of metric like validation loss or validation accuracy. Another parameter is threshold (if loss is not decreasing by certain threshold value). Last is nos. of epochs (if loss has not decrease over certain nos. of epochs we will stop the training process).
Consider eg. threshold = 0.01 & nos. of epochs = 5. if we selected these parameters, the training will stop when the validation loss has not decreased by 0.01 over 5 successive epochs. These are the parameters which we have to consider while using early stopping method and we have to be very care full while choosing these parameters and see it that it does not leads to overfitting again.
Let us see where does early stopping round fits in our normal steps to solve.
Let us look at the practical implementation of early stopping:
2. DROPOUT REGULARIZATION.
We discussed one of the problem with neural network i.e. after certain nos. of epochs model starts to overfit and how we can resolve it by using early stopping previously. Early stopping is just one method to prevent overfitting.
Let us try to find out the reasons of overfitting first and then we will come to another solution of overfitting.
- Does not learn patterns / signals from training data.
- Model starts to memorize the training data.
Here, comes the another method to prevent overfitting i.e. Dropout Regularization.
Recalling back our architecture of neural networks were we saw hidden layers are responsible for creating features and some of them might lead to overfitting because of memorizing the training data so, what dropout does is that, it drops some of the neurons from hidden layer and ensures that none of the neurons are dependent on some of the specific features.
Let us understand it more clearly. Dropout assigns the probability to each of the hidden layers on which we wish to apply drop out. This probability denotes a chance whether the neuron will be dropped or not.
How does it helps to overcome overfitting ?
Without dropout it might be possible that a particular neuron from hidden layer would be influenced by only a particular feature which leads to overfitting since our model is now memorizing the training set.
Thus, dropout makes sure that the output of any one neuron in not dependent on any one feature. In dropout, when we randomly drop the neurons from the hidden layers in each iteration we will see, we have different set of neurons which makes sure that no neurons are memorizing the inputs or getting influenced by some feature.
KEY POINTS TO REMEMBER:
- Neurons are dropped randomly at each iteration.
- Generally, we don’t apply dropout on the input layer.
- No dropout during testing.
Let us see where does dropout fits in our normal steps to solve.
Let us look at the practical implementation of drop out.
3. BATCH NORMALIZATION.
The technique what we will be discussing here is most frequently used by deep learning practitioners and it generally works almost every time i.e. Batch Normalization.
Consider below neural network,
- X1,X2, X3,X4 represent nos. of features.
- W1,W2,W3,W4. . . Wn represent the weights.
- Consider bias = 0 simplicity sake.
- Activation function = sigmoid.
If we recall, we normalize the inputs before sending them to the model which helps to speed up the training process. Using this normalize value we perform forward propagation and proceed ahead in calculate the activation function of hidden layers.
Consider the above figure,
If we see in the figure, inputs were normalize but as we compute hidden layer activations by performing dot product which means the values are not normalize so, it would be great if these activation function which we are applying on hidden layer are also normalize. So, imagine if these activation values are normalized we would be achieving great results and less training time would be taken by our model to converge. Batch Normalization help us to achieve this task.
How the batch normalization happen ?
- We calculate the mean of all the activations of particular layer.
- Next we take the standard deviation
- Once we have mean and std. we normalize the values, where we substract mean from activation values and divide by std. and add a very less value called epsilon to make sure denominator is not 0. This makes the mean and std. of all hidden layer activations 0 and 1 respectively.
Let us see where does batch normalization fits in our normal steps to solve.
Also in practical session we saw using batch normalization our model converge faster and achieved better accuracy in less nos. of epochs.
What are the advantages of using Batch Normalization ?
- Batch normalization speeds up the training process by normalization of hidden layer activation.
- Smoothens the loss function / cost function.
Let us look at the practical implementation of the batch normalization.
4. IMAGE AUGMENTATAION.
Why do we need Image Augmentation ?
Image Augmentation is basically altering the existing data in order to create more data for the model in training process.
A deep learning model works well when it has huge amount of data. More the data better is the performance of the model. But acquiring massive amount of data comes with its own challenges. Every time we won’t be having large amount of data to feed the network.
If we recall in the emergency and non-emergency classification we had just 2300 images. Also, it becomes too expensive to collect the data which might result into shortage of the data because of which our deep learning model cannot learn patterns from the data hence it might not give good performance. So, in this case Image Augmentation comes into picture.
Instead of spending days manually collecting the data, we can use image augmentation technique which generates new images from original image. With image augmentation we have not collect our data manually. Also, it increases the nos. of training samples.
Consider the below figure where we have original image form training data on left hand side and remaining are the samples of the image obtained form image augmentation
IMAGE AUGMENTATION TECHNIQUES.
- Image Rotation: This is one of the most commonly used technique. Even if we rotate the image, the information of image remains same. Eg. A cat is cat even if we see it from different angle.
2. Image Shifting: If we are using this technique, we can change the position of the object in an image and hence give more variety to the model which can lead to more generalize model. It is basically geometric transformation that maps the positions of every object in an image to a new location in the final image. An object presented on location (x,y) in the input image is shifted to a new position (X, Y), where X = x + dx, Y = x + dy. Here dx and dy are respective shifts along different directions.
3. Image Flipping: Flipping can be considered as an extension of rotation. It allows us to flip an image in left-right direction or up-down direction. The left most image is original image from training set and other two images are flipped images.
4. Image Noising: It is another popular technique where we add noise to the image which basically helps our model to separate the signals from noise in an image. This makes our model more robust to changes in an input image.
5. Image Blurring: Images comes from different sources. Some might be high quality image where as some might be low quality image. In such cases we can blur the original high quality image and this will make our model more robust to changes in the image.
Let us look at the practical implementation of image augmentation.
5. MODEL CHECKPOINTING.
This technique is another important technique which is used while building our neural network model. It has following advantages:
- It saves the best model for us during the training process.
In the above fig. while making the predictions the weights and biases stored during the last epochs will be used. So, the model will train till specified nos. of epochs which is 50 in our case and parameter learnt during the last epoch will be used to make the predictions. But, if we look closely at the figure the best validation loss is achieved at epoch nos. 45.
Let us look at the model history to understand it more clearly.
So, through model checkpointing instead of saving the last model or the parameters of last epoch we are going to save the model which produces best result and this model is called as best model.
- In case of system failure, we wont lose much information since model is being saved constantly.
So, after understanding the intuition behind the model checkpointing we must be wondering how does it saves the best model. So, to answer that in keras we define 2 parameters i.e. monitor & mode.
Let us look at the practical implementation of the model checkpointing.
So, these are techniques which can be used for improving our deep learning model. Let us combine all of them and see the performance of our model.
Following is the practical implementation link. You can hit up at my GitHub profile for code & dataset for this part. Please refer EMERGENCY vs NON-EMERGENCY — MODIFIED VERSION.ipynb.
Also, if you are just interested to look at the demo of this part please visit the below.
So, in this blog we have learnt how to solve an image classification problem using artificial neural network. We also saw various improvising techniques like early stopping , dropout, batch normalization, model checkpointing which can used to improve our deep learning model. Lastly, we combined all the techniques and saw the performance of our model.
Do connect with me on LinkedIn : https://www.linkedin.com/in/gaurav-rajpal/
Stay tuned for further updates on demo projects where we play with image dataset in Convolution Neural Network.
Gaurav Rajpal (firstname.lastname@example.org)