Original article can be found here (source): Deep Learning on Medium
How feature engineering trumps algorithms
I will practically prove here that feature engineering, when done right can achieve better accuracy regardless of choice of ML/DL algorithms used for predictions
Dataset Description and Methodology
I will be using electricity consumption data for a commercial building which contains data for one building from April 2017 till April 2018. This building contains 3 meters (main meter and 2 sub-meters). Hence, we will have 3 targets for each algorithm. Also, we will use special type of RMSE metric to evaluate our results for each algorithm.
In this post, I will try to showcase how different set of features combined with 3 different algorithms like XGBoost, LightGBM and GRU perform on this dataset.
After installing necessary libraries, we will load train set (Apr-Dec 2017) and test set (Jan-Apr 2018).
We could see from plots above that data is not stationary and there are sudden spikes and dips in usage for all 3 meters. Hence, I chose to use ML/DL methods over statistical models for forecasting.
We will now begin to generate features for each of the methods mentioned before.
Feature Engineering#1 with XGBoost
As we have seen from above data plots, there is sudden spike in consumption from 6 am to 6 pm. We could build a feature like ‘working hours’ around this so our algorithm can understand that something special is happening at this period.
Additionally, we can see that on weekends (Sunday) power consumption is relatively low compared to weekdays, which had to be as it is a commercial building. Hence, we will build features for days as well.
This is the complete trend for training period given. We can see that over the year there is some sort of seasonality like power consumption is high in June-August period and November-January period whereas consumption is lower in remaining months in main meter readings.
So, we can construct 2 more features like ‘season’ and ‘months’.
It’s time for us to construct aforementioned features and train our model using XGBoost regressor.
Note: I have already tuned the hyper-parameters for XGBoost and will not include here for the sake of code clarity and keeping everything to the point.
We could see from the above plot that our predictions are quiet good and it ables to captures peaks and troughs in data.
We will now apply regressors to predict 2 sub-meters.
We could see from above plots that we are able to predict values for sub-meter with good accuracy atleast on training set.
As mentioned earlier, we will use weighted RMSE to evaluate our test results for all 3 meters. This metric gives higher weight to the nearer forecasted months and lesser weight to the predicted data farther in future.
where A, B and C represent main_meter, sub_meter_1 and sub_meter_2 respectively,
- T is the total number of timestamps in the test set for a particular building,
- t is the timestamp for which prediction is being made,
- k = (ln 2) / 100
- d(t) is the value of the day in which timestamp t falls
- m(it) is the actual value of meter i at time step t
- Caret(m(it)) is the predicted value of meter i at time step t
- Bar(m(i)) is the mean value of meter i — This value is used for normalization
Our weighted RMSE for test set using XGBoost is 26.33