Weather forecasting with Recurrent Neural Networks

Source: Deep Learning on Medium

With advances in Data Science and Technology, we are able to get precise forecasts of the weather in almost every location around the world. Data collected from weather stations and satellites are used to analyze and predict the meteorological disasters caused by extreme weather. Machine Learning can be used to forecast weather with historical weather data.

In this article, we will develop a deep learning model with Recurrent Neural Networks to provide 4 days forecast of the temperature of a location by considering 90 days of historical temperature data.

What is Recurrent neural network(RNN)?

RNN is a deep learning model that is used for Time-series prediction etc. RNN is used when we want to predict a future outcome based on the previous inputs. For example, we can use RNN to predict the next word in a sentence by providing previous words.

RNN for Weather forecasting.

We will develop an RNN model that can provide 4 days forecast of temperature based on 30 days of historical temperature data.

Let’s load the dataset and see the first few rows:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#import dataset from data.csv file
dataset = pd.read_csv('data.csv')
dataset = dataset.dropna(subset=["Temperature"])
training_set = dataset.iloc[:,4:5].values

We include only the temperature column as we are going to forecast temperature and drop all the rows that have no values or has a NaN.

Next, we will have to apply feature scaling to normalize temperature in the range 0 to 1.

#Feature Scaling 
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)

We will create a training set such that for every 90 days we will provide the next 4 days temperature as output. In other words, input for our RNN would be 90 days temperature data and the output would be 4 days forecast of temperature.

x_train = []
y_train = []
n_future = 4 # next 4 days temperature forecast
n_past = 30 # Past 30 days
for i in range(0,len(training_set_scaled)-n_past-n_future+1):
x_train.append(training_set_scaled[i : i + n_past , 0])
y_train.append(training_set_scaled[i + n_past : i + n_past + n_future , 0 ])
x_train , y_train = np.array(x_train), np.array(y_train)x_train = np.reshape(x_train, (x_train.shape[0] , x_train.shape[1], 1) )

x_train contains 30 previous temperature inputs before that day and y_train contains 4 days temperature outputs after that day. Since x_train and y_train are lists we will have to convert them to numpy array to fit training set to our model.

Now we are ready with our training data so let’s proceed to build an RNN model for forecasting weather.

  1. First, we will import keras sequential model from keras.models and keras layers ie. LSTM, Dense and dropout.
from keras.models import Sequential
from keras.layers import LSTM,Dense ,Dropout
# Fitting RNN to training set using Keras Callbacks. Read Keras callbacks docs for more info.

2. Let us define the layers in our RNN. We will create a sequential model by adding layers sequentially using sequential(). The first layer is a Bidirectional LSTM with 30 memory units return_sequence=True means that the last output in the output sequence is returned and the input_shape describes the structure of the input. With Bidirectional LSTM the output layer gets feedback from past(forward) as well as future(backward) states simultaneously. We add 3 hidden layers and an output layer with a linear activation function that outputs 4 days temperature. And at the last, we fit the RNN model with our training data.

regressor = Sequential()regressor.add(Bidirectional(LSTM(units=30, return_sequences=True, input_shape = (x_train.shape[1],1) ) ))
regressor.add(LSTM(units= 30 , return_sequences=True))
regressor.add(LSTM(units= 30 , return_sequences=True))
regressor.add(LSTM(units= 30))
regressor.add(Dense(units = n_future,activation='linear'))
regressor.compile(optimizer='adam', loss='mean_squared_error',metrics=['acc']), y_train, epochs=500,batch_size=32 )

3. Create test data to test our model performance.

# read test dataset
testdataset = pd.read_csv('data (12).csv')
#get only the temperature column
testdataset = testdataset.iloc[:30,3:4].values
real_temperature = pd.read_csv('data (12).csv')
real_temperature = real_temperature.iloc[30:,3:4].values
testing = sc.transform(testdataset)
testing = np.array(testing)
testing = np.reshape(testing,(testing.shape[1],testing.shape[0],1))

4. Now that we have our test data ready, we can test our RNN model.

predicted_temperature = regressor.predict(testing)predicted_temperature = sc.inverse_transform(predicted_temperature)predicted_temperature = np.reshape(predicted_temperature,(predicted_temperature.shape[1],predicted_temperature.shape[0]))

The output from the model is in the normalized form, so to get the actual temperature values we apply inverse_transform() to the predicted_temperature and then reshape it.

Let’s compare the predicted and real temperatures. As we can see the model performs well with the given test data.

array([[82.], [82.], [83.], [83.]])
array([[83.76233 ], [83.957565], [83.70461 ], [83.6326 ]])

If we forecast temperature for a month and visualize it we get the following results.

Forecast of temperature over a month

To conclude, you can try increasing the accuracy of the model with changing epochs or adding more layers to the model.