Part 1 is here:
Artificial neural networks (ANNs) are not new, but now they became the next big thing. And it didn’t happen by chance…medium.com
On the last article, we discovered what are ANNs, why they are important and how to create a simple network. We did all this using the Titanic dataset and in the end we had a 82% accuracy score. Now, we’ll add cross validation and Grid Search to the recipe. Our goal will be to improve our model to achieve a better accuracy and avoid overfitting.
Last time, we only divided our data into training and test. While this approach is a good start while training models, it’s not always the best one. The cross validation approach divides the data set into k equal parts. It selects the first k-1 to use as training and the remaining part to test the model. The process is repeated until all parts are used as training and test.
Cross validation is more expensive in terms of computational resources. Train-test is faster to do, but k-fold provides better results in terms of confidence. It shows more ‘real’ results.
In order to apply it, we’ll rebuild our network, but this time, we’ll create a function that will be used by an instance of the
KerasClassifier(). The classifier can take a function as argument — the one we’ll create. We’ll then submit our classifier to
cross_val_score() to be used as an estimator. We’ll choose 10 as the number of folds.
By the way, this operation can take a few minutes.
Reminder: do the data preprocessing part before running the code below.
After the training, we check the average accuracy of the model and the average variance. We are trying to create a model with high accuracy and low variance, all of this avoiding overfitting.
After that, let’s fit our model and see how it behaves when we try to predict the values on the test set.
Notice a few things: the average accuracy of the model is 81%, while the variance is relatively low (3%). After fitting the model, we can do some predictions. We have a surprise now: after applying cross validation, we had a much better accuracy of 83%. But we can still do better…
While we try to create a better model, we want to reduce overfit risk. Overfit happens when your model is really good working with known observations, but very poor when working with new observations. We all want better accuracy, but it represents nothing if the model is not able do generalize to new data.
A way to reduce overfit risk is using dropout regularisation. It works by ‘disabling’ a few hidden and visible neurones during the training phase. In order to do it, we’ll rewrite our
build_classifier() function adding the dropout instructions.
We have to give an argument to the dropout function, which’s called
rate. The rate is the probability of dropout for the neurones of a layer. We’ll use 10% as value here. But we’ll do more than just adding dropout. We also want to tweak our model a little bit.
Let’s also add more neurones to our network. For the input layer, let’s use 1 neurone for each feature and an extra for a bias term (13 neurones in total). For the hidden layer, let’s use the average rule: the number of neurones is the average of the input layer and the output layer (7 neurones).
We want to test different parameters to check which one work the most for our model. We do it by using
GridSearchCV. This method looks for the best combination of hyperparameters given a dictionary of values.
What we’ll do here is basically create a dictionary of parameters and give it to GridSearchCV. The method will evaluate all the possible combinations and will return the ones that result in better performance.
By the way, if the CV step took a few minutes to complete, this step can take much more. Sometimes, it can take hours. So, be aware of that before hitting play. You’d rather doing this during the night :P.
Since we have an unbalanced data set, I have decided to use the
class_weight parameter so the model will pay more attention to the survivors.
Evaluating the results
It’s time now to check the results. First, let’s see the best parameters and the best accuracy found by the algorithm.
For the accuracy, we had 79%. For the parameters, we had the following:
And now, let’s see which results we have by submitting our test data to the model. But first, the code:
Then the results:
As you can see, we had an accuracy of 84% this time.
There are plenty of possibilities here, as you may guess: changing the number of neurones, tuning the algorithm, adding layers, preprocessing even more the input data… There are actually no limits you want to have even better results.
Source: Deep Learning on Medium