Deep Learning in 15 Lines of Python



Are you getting started with Machine Learning or having trouble finding a place to start? Same here! Which is why I created this post. I’m not here to boil the ocean so here’s a few things to answer off the bat:

What do these 15 lines of Code do? Multiclass Classification
We have 3 know classes of flowers that we’re trying to sort. 
What is Deep Learning? Its a subset of Machine Learning based upon structures in the brain. (Example: Neural Networks)
Why did you choose Keras? (Also what is Keras…) 
Keras is a high-level Nueral Networks API written in Python on TensorFlow. Keras does Deep Learning like Google Home automates your home: You could still turn off your lights, lock the door, etc. (This would be TensorFlow) but instead you can now do it all with a simpler interface (Keras).

Coderview (Code-Overview)

Before looking at the code, lets understand what each piece is trying to do:

  1. Import the dataset. This is a 1 liner to pull the iris data.
  2. Prepare the inputs by removing the last column in the dataset.
  3. Prepare the output. (This looks a bit hairy so I’ll go over it in detail below)
  4. Create, compile, and run the model

Results:
Epoch 1/150
150/150 [==============================] — 0s 2ms/step — loss: 4.1396 — acc: 0.3333
Epoch 2/150
150/150 [==============================] — 0s 89us/step — loss: 3.6754 — acc: 0.3333
Epoch 3/150
150/150 [==============================] — 0s 94us/step — loss: 3.2460 — acc: 0.3333

Epoch 148/150
150/150 [==============================] - 0s 95us/step - loss: 0.1776 - acc: 0.9733
Epoch 149/150
150/150 [==============================] - 0s 86us/step - loss: 0.1772 - acc: 0.9667
Epoch 150/150
150/150 [==============================] - 0s 94us/step - loss: 0.1759 - acc: 0.9733
150/150 [==============================] - 0s 2ms/step
[0.17427776996046304, 0.9800000011920929]

After this run: we got an accuracy of 98% when comparing the model results to our outputs. There’s a loss function of 17.4% (How well the algorithm models the dataset where lower is better)

That’s it! Now lets dig into the details.

Preparing the dataset, inputs and outputs (1,2,3)

Initially, the data set looks like this (1):

iris.iloc[[0,50,100]]
         0     1     2     3           4         
----- ----- ----- ----- ----- -----------------
0 5.1 3.5 1.4 0.2 Iris-setosa
50 7.0 3.2 4.7 1.4 Iris-versicolor
100 6.3 3.3 6.0 2.5 Iris-virginica

We remove the labels to make the inputs (2):

input_x = iris.drop(4,axis=1)
         0     1     2     3   
----- ----- ----- ----- -----
0 5.1 3.5 1.4 0.2
50 7.0 3.2 4.7 1.4
100 6.3 3.3 6.0 2.5

Outputs (3)

we need to make the 3 different classes (Iris-setosa, Iris-versicolor, Iris-virginica) non-bias categories that Keras can understand.

What does that mean? Lots of folks first instinct will be to turn them into numbers: Iris-setosa = 1, Iris-versicolor = 2, Iris-virginica = 3

But those outputs are integers and inherently have “weight” 
This is why Keras has a function called to_categorical. We use this to turn these 3 classes into an array of binaries. Basically each class is a column like:

Iris-setosa   Iris-versicolor   Iris-virginica  
------------- ----------------- ----------------
1 0 0
0 1 0
0 0 1

This way no class has higher weight that the other two.

How I approached this in python was the follow:

  1. Get me a key_value that looks like this: 
    {‘Iris-setosa’: 0, ‘Iris-versicolor’: 1, ‘Iris-virginica’: 2}
    – Give me all the unique classes iris[4].unique()
    – Turn that sequence into a list()
    – loop over the list and assign the elements as keys and their indexes as values {v: k for k, v in enumerate(alist)}
  2. Replace the output names (“Iris-…”) with the indexes from key_value and store that into output_class_int.
  3. Finally, replace those 0s, 1s, 2s with the binary arrays above.
    [ 0 = [1,0,0], 1 = [0,1,0], 2 = [0,0,1] ] and store that into output_y.
key_value = {v: k for k, v in enumerate(list(iris[4].unique()))}
output_class_int = iris.replace({4:key_value})[4]
output_y = keras.utils.to_categorical(output_class_int,len(key_value))

I know that was a lot. Lets join the input_x and out_y just so you can see how this looks:

         0     1     2     3    4   5   6  
----- ----- ----- ----- ----- --- --- ---
0 5.1 3.5 1.4 0.2 1 0 0
50 7.0 3.2 4.7 1.4 0 1 0
100 6.3 3.3 6.0 2.5 0 0 1

If you compare the above to above “Initially, the data set looks like this (1):”
You’ll see that row 0 is the Iris-setosa, row 50 is Iris-versicolor, and row 100 is Iris-virginica.

The Model

Create (4a)

model = keras.models.Sequential()
model.add(Dense(8, input_dim=4, activation=’relu’))
model.add(Dense(3, activation=’softmax’))

For our model, we use Keras Sequential model. You can think of the Sequential Model as a layered cake (been watching a lot of the Great British Baking Show lately) where each “layer” is part of the neural network.

Sometimes its hard to pick layers.

The first layer: model.add(Dense(8,input_dim=4,activation='relu'))

Add this first layer. Lets use 8 nodes for this layer. We have 4 input attributes (look at input_x). Lets use Rectified Linear Unit (relu) for our activation function.

The second layer: model.add(Dense(3,activation='softmax’))

Add this last layer. There will be 3 attributes for each output. (This is because of the to_categorical function we used above in the output section). Lets use softmax as our activation function.

Compile (4b)

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

After we create a model in Keras, we need to compile that model. 
Back to our cake analogy, we’ve baked and stacked the cake and now we’re putting on the icing and extras.

loss=’categorical_crossentropy’ 
We’re setting the loss function to categorical_crossentropy (AKA. loss between 0 and 1 for each class)

optimizer='adam' 
We’re setting the optimizer to Adam (adaptive moment estimation) to update network weights based in the training data.

metrics=['accuracy']
We’re adding accuracy to our output metrics. (This is why we see the 0.980000000 in our output)

Fit and Evaluate (4c)

model.fit(input_x, output_y, epochs=150, batch_size=15, verbose=1)
score = model.evaluate(input_x, output_y, batch_size=15)
score

The fit & evaluate is where everything ties together. 
Back to the Great British Baking Show analogy: 
– Everyone is given a recipe (input_x)
– They have their technique and execution (the model
– Which creates a baked good (the model’s output) 
– This is judged (evaluate) by a panel 
– Who have preconceived expectations (output_y)

Note: This part of the model is the first time data appears in modeling. Notice that input_x and output_y isn’t even mentioned in the create or compile sections. This is important to remember conceptually with Keras.

What are all these other components in the fit and evaluate sections?

First verbose is just a flag to show the output of each epoch in the results. The epochs variable determines the number of lifecycles of the full dataset to run the model against. batch_size tells the model how large of a subset to cut for each run. How are these two related? Here’s a quick example:

Say you have 105 people (dataset_size = 105). You’re running a test and want to test 10 at a time (batch_size = 10). You will have to run this test 11 times with the last iteration only having 5 people in it. You decided you want everyone to take the test 3 times (epoch = 3). In order to get all 105 people to take the test 3 times in batches of 10, we’ll have to run the test 33 times (11 runs per epoch with 3 epochs).

Back to our exercise we have a dataset_size of 150 (that’s how many rows are in the csv). We are running this across 150 epochs with a batch_size of 15.

Next Steps

This is a barebones attempt at Multiclass Classification using Keras. As with a lot of deep learning there are a bunch of ways we can expand this out

  1. Evaluating this against other test sets of data
  2. Adding attributes or classes to expand the input and output
  3. Adding layers to the network (which has its pros and cons)

Thanks for reading! If you have questions/comments please feel free to drop them below. If you have more pointed questions feel free to reach out to me on LinkedIn.

Cheers,
AndrewDoesData
DAG Solutions

Appendix

  • Keras Input Explaination on stack overflow: For me the top answer was the best overview of how to think of Keras’ interpretation of Neural Networks.
  • Guide to Keras Sequential Model: Keras has wonderful documentation for use. Remember, this documentation assumes you have an understanding of Deep Learning concepts. (which most of us do not) Lot of my research was googling all the terms in this documentation.
  • Activation Functions: Relu vs softmax vs everything else? If you’re new to Deep Learning this is a great cheat sheet for Activations.
  • Adam Optimization for Deep Learning: Here’s a great walkthrough of Adam and why its used of the classical stochastic gradient descent.
  • ML Cheatsheet Loss Functions: As with Activation and Optimization, here is a good read on Loss functions and why they are necessary to ML.
  • Multi-Classificartion-Tutorial-Keras-Deep-Learning: This was the basis of my code. I branched from this approach in two ways: 1)output_y is built in a more pythonic way versus using numpy encoder. 2) I used Keras’ native evaluator instead of numpy.
  • Building a Deep Learning Model with Keras: This is the other post I used to build out the fit functions.

Source: Deep Learning on Medium