Deep learning is an increasingly popular subset of machine learning. Deep learning models are built using neural networks. A neural network takes in inputs, which are then processed in hidden layers using weights that are adjusted during training. Then the model spits out a prediction. The weights are adjusted to find patterns in order to make better predictions. The user does not need to specify what patterns to look for — the neural network learns on its own.

Keras is a user-friendly neural network library written in Python. In this tutorial, I will go over two deep learning models using Keras: one for regression and one for classification. We will build a regression model to predict an employee’s wage per hour, and we will build a classification model to predict whether or not a patient has diabetes.

Note: The datasets we will be using are relatively clean, so we will not perform any data preprocessing in order to get our data ready for modeling. Datasets that you will use in future projects may not be so clean — for example, they may have missing values — so you may need to use data preprocessing techniques to alter your datasets to get more accurate results.

**Reading in the training data**

For our regression deep learning model, the first step is to read in the data we will use as input. For this example, we are using the ‘hourly wages’ dataset. To start, we will use Pandas to read in the data. I will not go into detail on Pandas, but it is a library you should become familiar with if you’re looking to dive further into data science and machine learning.

‘df’ stands for dataframe. Pandas reads in the csv file as a dataframe. The ‘head()’ function will show the first 5 rows of the dataframe so you can check that the data has been read in properly and can take an initial look at how the data is structured.

Import pandas as pd

#read in data using pandas

train_df = pd.read_csv(‘data/hourly_wages_data.csv’)

#check data has been read in properly

train_df.head()

#### Split up the dataset into inputs and targets

Next, we need to split up our dataset into inputs (X) and our target (y). Our input will be every column except ‘wage_per_hour’ because ‘wage_per_hour’ is what we will be attempting to predict. Therefore, ‘wage_per_hour’ will be our target.

We will use pandas ‘drop’ function to drop the column ‘wage_per_hour’ from our dataframe and store it in the variable ‘train_X’. This will be our input.

#create a dataframe with all training data except the target column

train_X = train_df.drop(columns=['wage_per_hour'])#check that the target variable has been removed

train_X.head()

We will insert the column ‘wage_per_hour’ into our target variable (y).

#create a dataframe with only the target column

train_y = train_df[['wage_per_hour']]#view dataframe

train_y.head()

#### Building the model

Next, we have to build the model. Here is the code:

fromkeras.modelsimportSequentialfromkeras.layersimportDense

#create model

model = Sequential()#get number of columns in training data

n_cols = train_X.shape[1]#add model layers

model.add(Dense(10, activation='relu', input_shape=(n_cols,)))

model.add(Dense(10, activation='relu'))

model.add(Dense(1))

The model type that we will be using is Sequential. Sequential is the easiest way to build a model in Keras. It allows you to build a model layer by layer. Each layer has weights that correspond to the layer the follows it.

We use the ‘add()’ function to add layers to our model. We will add two layers and an output layer.

‘Dense’ is the layer type. Dense is a standard layer type that works for most cases. In a dense layer, all nodes in the previous layer connect to the nodes in the current layer.

We have 10 nodes in each of our input layers. This number can also be in the hundreds or thousands. Increasing the number of nodes in each layer increases model capacity. I will go into further detail about the effects of increasing model capacity shortly.

‘Activation’ is the activation function for the layer. An activation function allows models to take into account nonlinear relationships. For example, if you are predicting diabetes in patients, going from age 10 to 11 is different than going from age 60–61.

The activation function we will be using is ReLU or Rectified Linear Activation. Although it is two linear pieces, it has been proven to work well in neural networks.

The first layer needs an input shape. The input shape specifies the number of rows and columns in the input. The number of columns in our input is stored in ‘n_cols’. There is nothing after the comma which indicates that there can be any amount of rows.

The last layer is the output layer. It only has one node, which is for our prediction.

#### Compiling the model

Next, we need to compile our model. Compiling the model takes two parameters: optimizer and loss.

The optimizer controls the learning rate. We will be using ‘adam’ as our optmizer. Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate throughout training.

The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.

For our loss function, we will use ‘mean_squared_error’. It is calculated by taking the average squared difference between the predicted and actual values. It is a popular loss function for regression problems. The closer to 0 this is, the better the model performed.

#compile model using mse as a measure of model performance

model.compile(optimizer='adam', loss='mean_squared_error')

#### Training the model

Now we will train our model. To train, we will use the ‘fit()’ function on our model with the following five parameters: training data (train_X), target data (train_y), validation split, the number of epochs and callbacks.

The validation split will randomly split the data into use for training and testing. During training, we will be able to see the validation loss, which give the mean squared error of our model on the validation set. We will set the validation split at 0.2, which means that 20% of the training data we provide in the model will be set aside for testing model performance.

The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. In addition, the more epochs, the longer the model will take to run. To monitor this, we will use ‘early stopping’.

Early stopping will stop the model from training before the number of epochs is reached if the model stops improving. We will set our early stopping monitor to 3. This means that after 3 epochs in a row in which the model doesn’t improve, training will stop. Sometimes, the validation loss can stop improving then improve in the next epoch, but after 3 epochs in which the validation loss doesn’t improve, it usually won’t improve again.

fromkeras.callbacksimportEarlyStopping

#set early stopping monitor so the model stops training when it won't improve anymore

early_stopping_monitor = EarlyStopping(patience=3)

#train model

model.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

#### Making predictions on new data

If you want to use this model to make predictions on new data, we would use the ‘predict()’ function, passing in our new data. The output would be ‘wage_per_hour’ predictions.

#example on how to use our newly trained model on how to make predictions on unseen data (we will pretend our new data is saved in a dataframe called 'test_X').

test_y_predictions = model.predict(test_X)

Congrats! You have built a deep learning model in Keras! It is not very accurate yet, but that can improve with using a larger amount of training data and ‘model capacity’.

#### Model capacity

As you increase the number of nodes and layers in a model, the model capacity increases. Increasing model capacity can lead to a more accurate model, up to a certain point, at which the model will stop improving. Generally, the more training data you provide, the larger the model should be. We are only using a tiny amount of data, so our model is pretty small. The larger the model, the more computational capacity it requires and it will take longer to train.

Let’s create a new model using the same training data as our previous model. This time, we will add a layer and increase the nodes in each layer to 200. We will train the model to see if increasing the model capacity will improve our validation score.

#training a new model on the same data to show the effect of increasing model capacity#create model

model_mc = Sequential()#add model layers

model_mc.add(Dense(200, activation='relu', input_shape=(n_cols,)))

model_mc.add(Dense(200, activation='relu'))

model_mc.add(Dense(200, activation='relu'))

model_mc.add(Dense(1))#compile model using mse as a measure of model performance

model_mc.compile(optimizer='adam', loss='mean_squared_error')

#train model

model_mc.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

We can see that by increasing our model capacity, we have improved our validation loss from 32.63 in our old model to 28.06 in our new model.

#### Classification model

Now let’s move on to building our model for classification. Since many steps will be a repeat from the previous model, I will only go over new concepts.

For this next model, we are going to predict if patients have diabetes or not.

#read in training data

train_df_2 = pd.read_csv('documents/data/diabetes_data.csv')#view data structure

train_df_2.head()

#create a dataframe with all training data except the target column

train_X_2 = df_2.drop(columns=['diabetes'])#check that the target variable has been removed

train_X_2.head()

When separating the target column, we need to call the ‘to_categorical()’ function so that column will be ‘one-hot encoded’. Currently, a patient with no diabetes is represented with a 0 in the diabetes column and a patient with diabetes is represented with a 1. With one-hot encoding, the integer will be removed and a binary variable is inputted for each category. In our case, we have two categories: no diabetes and diabetes. A patient with no diabetes will be represented by [1 0] and a patient with diabetes will be represented by [0 1].

fromkeras.utilsimportto_categorical

#one-hot encode target column

train_y_2 = to_categorical(df_2.diabetes)#vcheck that target column has been converted

train_y_2[0:5]

#create model

model_2 = Sequential()#get number of columns in training data

n_cols_2 = train_X_2.shape[1]#add layers to model

model_2.add(Dense(250, activation='relu', input_shape=(n_cols_2,)))

model_2.add(Dense(250, activation='relu'))

model_2.add(Dense(250, activation='relu'))

model_2.add(Dense(2, activation='softmax'))

The last layer of our model has 2 nodes — one for each option: the patient has diabetes or they don’t.

The activation is ‘softmax’. Softmax makes the output sum up to 1 so the output can be interpreted as probabilities. The model will then make its prediction based on which option has a higher probability.

#compile model using accuracy to measure model performance

model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

We will use ‘categorical_crossentropy’ for our loss function. This is the most common choice for classification. A lower score indicates that the model is performing better.

To make things even easier to interpret, we will use the ‘accuracy’ metric to see the accuracy score on the validation set at the end of each epoch.

#train model

model_2.fit(X_2, target, epochs=30, validation_split=0.2, callbacks=[early_stopping_monitor])

Congrats! You are now well on your way to building amazing deep learning models in Keras!

Thanks for reading! The jupyter notebook for this tutorial can be found here.

Source: Deep Learning on Medium