My first work with Keras

Original article was published by Oscar Rojo on Deep Learning on Medium

With this example I make an introduction in Deep Learning by Keras.

Keras was created to be user friendly, modular, easy to extend, and to work with Python. The API was “designed for human beings, not machines,” and “follows best practices for reducing cognitive load.”

Neural layers, cost functions, optimizers, initialization schemes, activation functions, and regularization schemes are all standalone modules that you can combine to create new models. New modules are simple to add, as new classes and functions. Models are defined in Python code, not separate model configuration files.

Photo by Gonzalo Remy on Unsplash

What is Keras?

Keras is a Python library for deep learning that wraps the efficient numerical libraries Theano and TensorFlow.

In this tutorial, you will discover how you can use Keras to develop and evaluate neural network models for multi-class classification problems.

After completing this step-by-step tutorial, you will know:

How to load data from CSV and make it available to Keras. How to prepare multi-class classification data for modeling with neural networks. How to evaluate Keras neural network models with scikit-learn.

Imports libraries

For this simple example we’ll use only a couple of libraries:

  • Pandas: for data loading and manipulation
  • Scikit-learn: for train-test split
  • Matplotlib: for data visualization
  • Keras: for model training

Here are the imports if you just want to copy/paste:

import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Import Dataset

As for the dataset, the Beer dataset, it can be found on this URL.

Prepare folder, files and download dataset form Kaggle:

This is a dataset of 75,000 homebrewed beers with over 176 different styles. Beer records are user-reported and are classified according to one of the 176 different styles. These recipes go into as much or as little detail as the user provided, but there’s are least 5 useful columns where data was entered for each: Original Gravity, Final Gravity, ABV, IBU, and Color.

We'll use the linux terminal:

#### Remove directorys and files
! rm -r input/
! mkdir input/
! cd input/
#### Show directory
! ls
#### Download Dataset
! kaggle datasets download -d jtrofe/beer-recipes
#### Unzip Dataset
! unzip
#### Move zip file
!mv input/
#### Move csv file
!mv recipeData.csv input/recipeDate.csv
!mv styleData.csv input/styleDate.csv
#### Show folder
! ls input/

Post- ETL

We are going to use a clean dataset.

Here’s how to import it in Pandas directly:

# load dataset
dataframe = pd.read_csv("MyData.csv")
dataset = dataframe.values

The dataset can be loaded directly. Because the output variable contains strings, it is easiest to load the data using pandas. We can then split the attributes (columns) into input variables (X) and output variables (Y).

X = dataset[:,1:5].astype(float)
Y = dataset[:,0]
Xarray([[ 6.71, 37. , 52.38, 6.69],
[ 6.48, 20. , 31.6 , 4.82],
[ 4.28, 28. , 40.87, 5.13],
[ 6.02, 33. , 68.3 , 7.74],
[17.04, 37. , 77.81, 4.54],
[37.46, 39. , 58.14, 6.76]])
'PALE', 'PALE', 'IPA', 'STOUT', 'IPA', 'PALE', 'IPA', 'IPA', 'IPA',
'IPA', 'PALE', 'IPA', 'PALE', 'IPA', 'PALE', 'STOUT', 'PALE',
'PALE', 'PALE', 'IPA', 'IPA', 'IPA', 'PORTER', 'PALE', 'IPA',
'PORTER', 'PALE', 'ALE', 'IPA', 'PALE', 'IPA', 'IPA', 'IPA', 'IPA','IPA', 'PORTER'],

Encode The Output Variable

The output variable contains five different string values.

When modeling multi-class classification problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to be a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is called one hot encoding or creating dummy variables from a categorical variable.

For example, in this problem three class values are IPA, ALE, PALE, STOUT, PORTER.

If we had the observations:


We can turn this into a one-hot encoded binary matrix for each data instance that would look as follows:

We can do this by first encoding the strings consistently to integers using the scikit-learn class LabelEncoder. Then convert the vector of integers to a one hot encoding using the Keras function to_categorical().

# encode class values as integers
encoder = LabelEncoder()
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

This is the result:

dummy_yarray([[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0.]], dtype=float32)

Define The Neural Network Model

If you are new to Keras or deep learning, see this helpful Keras tutorial.

The Keras library provides wrapper classes to allow you to use neural network models developed with Keras in scikit-learn.

There is a KerasClassifier class in Keras that can be used as an Estimator in scikit-learn, the base type of model in the library. The KerasClassifier takes the name of a function as an argument. This function must return the constructed neural network model, ready for training.

Below is a function that will create a baseline neural network for the iris classification problem. It creates a simple fully connected network with one hidden layer that contains 8 neurons.

The hidden layer uses a rectifier activation function which is a good practice. Because we used a one-hot encoding for our beer dataset, the output layer must create 5 output values, one for each class. The output value with the largest value will be taken as the class predicted by the model.

The network topology of this simple one-layer neural network can be summarized as:

4 inputs -> [8 hidden nodes] -> 5 outputs

Note that we use a “softmax” activation function in the output layer. This is to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities.

Finally, the network uses the efficient Adam gradient descent optimization algorithm with a logarithmic loss function, which is called “categorical_crossentropy” in Keras.

# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(5, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model

We can now create our KerasClassifier for use in scikit-learn.

We can also pass arguments in the construction of the KerasClassifier class that will be passed on to the fit() function internally used to train the neural network. Here, we pass the number of epochs as 200 and batch size as 5 to use when training the model. Debugging is also turned off when training by setting verbose to 0.

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

Evaluate The Model with k-Fold Cross Validation

We can now evaluate the neural network model on our training data.

The scikit-learn has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation.

First we can define the model evaluation procedure. Here, we set the number of folds to be 10 (an excellent default) and to shuffle the data before partitioning it.

kfold = KFold(n_splits=10, shuffle=True)

Now we can evaluate our model (estimator) on our dataset (X and dummy_y) using a 10-fold cross-validation procedure (kfold).

Evaluating the model only takes approximately 10 seconds and returns an object that describes the evaluation of the 10 constructed models for each of the splits of the dataset.

results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Baseline: 76.10% (4.41%)

The results are summarized as both the mean and standard deviation of the model accuracy on the dataset.

Note: Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times and compare the average performance.

This is a reasonable estimation of the performance of the model on unseen data. It is also within the realm of known top results for this problem.


And there you have it, it was easy.

I hope it will help you to develop your training.

Never give up!

See you in Linkedin!