Weather Predictions: ‘Classic’ Machine Learning Models Vs Keras

Source: Deep Learning on Medium

The Keras model did slightly better than the XGB version, correctly classifying 26 rainy/no-rain days out of 31. The XGB model managed 25 out of 31.

Both the Keras and XGB models shared the same weakness in recall scores (which were relatively lower compared to precision and f1 scores) — ie the ability to correctly classify rainy days as such.

Still, not too shabby overall. With a richer set of weather data, such as humidity and atmospheric pressure, we could possibly get even more accurate predictions.

Now, let’s dive into each approach separately.

‘Classic’ Machine Learning Approach Using Logistic Regression and XGB Classifier

Predictive modeling involves trying a number of different models to see which one works best, as well as fine tuning each model’s hyperparameters for the most optimal combination.

Scikit-learn’s pipeline and grid search features allow you to organise both tasks efficiently and run them in one go. In notebook 5.1, I chose to pit a Logistic Regression model against an XGB Classifier.

You can include more models in the same pipeline, or increase the number of XGB hyperparameters to tune. But the trade-off in time and resources is something you’ll have to weigh carefully. I picked the LogReg and XGB models to illustrate the extremes of the trade-offs one could encounter in these tasks.

The LogReg model took just seconds for a decent grid search, while the XGB Classifier took 7 hours (on a 6-core 2018 Mac Mini) for a grid search over 5 hyperparameters.

The XGB model gave slightly better scores, and could presumably be improved with a more exhaustive grid search. But is the 0.02 bump in scores worth the extra effort? Highly debatable in this instance.

I went ahead with the XGB model in any case, even though its performance was only slightly better than the LogReg model. Let’s have a look at the confusion matrix for the XGB’s predictions for December 2019:

The XGB model correctly classified 25 rainy/dry days out of 31, giving it an accuracy score of 0.8.

The model wrongly predicted that it would rain on 2 days, when they were in fact sunny (false positives). It also wrongly predicted 4 sunny days when it in fact rained on those days (false negatives).

The XGB model is weakest in terms of its recall score, in this case meaning its ability to correctly identify rainy days as such (10 out of 14). Can a deep learning model do better?

Keras Classifier With Grid Search

There’s a bewildering number of ways one could start experimenting with deep learning models. I wanted to start small, and see if I could integrate what I had learnt in Scikit-learn with the new techniques.

Keras popped up quickly as a good option, given the availability of two wrappers for the Scikit-learn API (for classification and regression). I also relied on two excellent online posts (here and here) to guide my code for notebook 5.2.

The workflow is essentially similar to routine Scikit-learn approaches, though one new step required a bit of trial-and-error: Defining the function that creates and returns a Keras sequential model.

Depending on how many hyperparameters you want to tune, the structure of the function would have to adjusted accordingly. For this post, I opted to tune the number of hidden layers, the number of neurons, the optimizer, the dropout rate, the batch size and the number of epochs.

The pipeline/gridsearch construction is essentially the same, aside from the need to pass the Keras function you’ve defined to the Keras Classifier’s “build_fn” argument.

A grid search with these settings took over 18-hours on my machine (CPU-only). Adjust accordingly if you want a faster trial. Let’s have a look at the confusion matrix for the predictions by the optimised Keras Classifier:

The Keras model performed better than the XGB model by correctly predicting one more day of rainy weather, ie, it correctly classified 26 rainy/dry days out of 31, compared to 25 for the XGB model.

The model wrongly predicted that it would rain on 2 days, when they were in fact sunny (false positives). It also wrongly predicted 3 sunny days when it in fact rained on those days (false negatives).

Like the XGB model, the Keras model is also weakest in terms of its recall score, in this case meaning its ability to correctly identify rainy days as such (11 out of 14). Let’s compare the metrics for the XGB and Keras models’ predictions on the validation set (December 2019 weather data):