Tensorflow — A deep learning framework

Source: Deep Learning on Medium

Tensorflow — A deep learning framework

Learn and do hands-on Tensorflow and get prepped to solve big problems

Introduction

In this article we will talk about some Deep Learning frameworks which are widely used and like any other thing, we will look for motivation why is it needed.

When I started working on Deep Learning, I started very much from the first principle, means first doing the math and then building it grounds up. So, that definitely included the forward propagation steps like matrix multiplication, selecting an activation function and computing it, writing the loss function and then taking derivatives to figure out the global minima of that loss (also known as back-propagation) using different optimization algorithms like gradient descent and Adam.

But as I started working with natural languages uses cases, or computer vision I started needing more and more complex neural networks like lots and layers, hidden units and then the most painstaking backpropagation process which would need labyrinth of derivatives of derivatives. They include Convolution Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) etc.

Now, that’s when you first seek a robust framework which does these complex computations at ease, so that you can focus on the use-case which you are trying to solve and not worry too much about the mathematics going behind it.

But according to me, it is very useful to first write a neural network from scratch and implement some shallow networks so that you know for sure what you are doing. According to me that is freedom and happiness and I am glad that I took that route.

All the more, these frameworks or libraries are built in a very optimized/efficient manner which should be leveraged when you are solving an use-case. They also run fast and open-sourced with good governance.

Some of the common Deep Learning frameworks are —

  • Keras
  • Tensorflow
  • PyTorch
  • Theano
  • mxnet
  • Caffe
  • mxnet
  • DL4J

In this article we will learn and focus on Tensorflow.

Get familiar with Tensorflow

So, let’s take a simple objective function also in deep learning parlance we call it as loss function and try to minimize it. In general we will minimize an objective function with one choice variable say “w” and no constraints associated with that (unconstrained optimization.)

In the loss function J(w) = w²-10*w+25, w=5 makes J(w) = 0 and that is the global minima of that function. Here’s the math and let’s say we don’t know it or we don’t need to know it to solve different use cases. Let’s use tensorflow to solve this for us —

If you run the above code, the answer you get is 4.999 which is approximately equals to 5, which is also what our Math says. You can also see that we had to mention just the forward prop. The backprop is being done with the optimizer itself.

Now, let’s make it little generic. Say in the above code you want to minimize a function of your training set “X”, you don’t want to hard code the coefficients (i.e. the data) of the objective and you want to pass it as a dictionary.

So, what tensorflow does is it creates a computational graph and then tries to optimize that.

Tensorflow Step by Step in accordance with Deep Learning

To code in tensorflow needs the following steps —

  1. Create tensors (variables) that are not yet executed or evaluated
  2. Write operations between those tensors
  3. Initialize your tensor
  4. Create session
  5. Run the session

Let’s see some examples.

  • Multiplication using integers

Define two numbers “a” and “b” and then multiply it and store the result in “c”.

import tensorflow as tf# Creating the computation graph - 
a = tf.constant(2) # tensor "a"
b = tf.constant(10) # tensor "b"
c = tf.multiply(a, b) # tensor "c"
# Run the computation graph
sess = tf.Session()
print(sess.run(c))
############# The output is 20 ######

Now, as you can see above that we just create the computation graph and then let tensorflow evaluate that.

Now let’s pass these values in run time using a concept called placeholders.

x = tf.placeholder(tf.int64, name = 'x') # a placeholder tensor
print(sess.run(10 * x, feed_dict = {x : 2})) # running the graph
sess.close()

A placeholder is simply a variable that you will assign data to only later, when running the session.

Compute WX + b where “W”, “X” and “b” are drawn from a random normal distribution. “W” has a shape (5, 4) and “X ”is (4, 1) and “b” is (5, 1)

The output is of shape (5, 1) and it is as per my random initialization —

[[-1.46020843]
[ 0.36620383]
[-2.46190526]
[-2.63372713]
[ 3.36976718]]

The result is —

0.7310586
0.9999546
1.0

Let’s define our cost function which we are trying to optimize —

Say you have a “y” vector which represents the classes 0 to c-1 where c is say 3. Then one way to represent “y” in this multi-class problem is to create a one-hot encodings in the following manner.

The output is —

[[0. 0. 1. 0.]
[1. 0. 0. 1.]
[0. 1. 0. 0.]]
  • Initialize with 0’s and 1’s
def zeros_ones(shape):
zeros = tf.zeros(shape) # "zeros" tensor
ones = tf.ones(shape) # "ones" tensor

with tf.Session() as sess:
zeros = sess.run(zeros)
ones = sess.run(ones)

sess.close()
return zeros, ones
print(zeros_ones(3))

The output is —

(array([0., 0., 0.], dtype=float32), array([1., 1., 1.], dtype=float32))

Now let’s solve a full fledged neural network in Tensorflow.

A full fledged neural network in Tensorflow

Problem Statement — Teach computer to decipher a very limited sign language.

Training Set : 1080 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (180 pictures per number)

Test Set : 120 pictures (64 by 64 pixels) of signs representing numbers from 0 to 5 (20 pictures) per number.

y = 5

Before we could proceed we need to make certain transformation to the data. Note link to the final code will be provided at the end of the article —

  1. Flatten the dataset
  2. Normalize the image vectors
  3. Convert training and test labels to one-hot matrix

After these transformation this is how the shape looks like of the input data —

print ("number of test examples = " + str(X_test.shape[1]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))
The Output is ---
number of test examples = 120
X_train shape: (12288, 1080)
Y_train shape: (6, 1080)
X_test shape: (12288, 120)
Y_test shape: (6, 120)

Now to build the classifier, let’s decide the modeling as aspect of it. The model which we will use here is as below and we will implement that in Tensorflow. (Linear → Relu) → (Linear → Relu) → (Linear → Softmax)

So, let’s proceed with a sequence of steps.

We first have to create placeholders for X and Y so that we can pass the training data while running the tensorflow session.

  • Initialize the Parameters

Here we will use Xavier initialization for weights and Zero initialization for biases. Also, in first layer there will be 25 hidden units, second layer 12 hidden units and third layer it will be 6 units corresponding to 6 classes. So, the dimensions will be as below —

W1 : [25, 12288]
b1 : [25, 1]
W2 : [12, 25]
b2 : [12, 1]
W3 : [6, 12]
b3 : [6, 1]

It will take in a dictionary of parameters and will complete the forward pass and will return the output of the last linear unit.

  • Backward propagation & parameter updates

This is where the beauty of deep learning frameworks come into play. We don’t need to write lines of codes to perform the back propagation using gradient descent. All we need to do is —

After you compute the cost function, you will create an “optimizer” object. You have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate. This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.

For instance, for gradient descent the optimizer would be:

optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)

To make the optimization you would do:

_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
  • Finally let’s plug all these functions together and build the final model.

Now to run the model —

parameters = model(X_train, Y_train, X_test, Y_test)

This is how the cost changes —

Cost after epoch 0: 1.855702
Cost after epoch 100: 1.017255
Cost after epoch 200: 0.733184
Cost after epoch 300: 0.573071
Cost after epoch 400: 0.468573
Cost after epoch 500: 0.381228
Cost after epoch 600: 0.313815
Cost after epoch 700: 0.253708
Cost after epoch 800: 0.203900
Cost after epoch 900: 0.166454
Cost after epoch 1000: 0.146636
Cost after epoch 1100: 0.107279
Cost after epoch 1200: 0.086698
Cost after epoch 1300: 0.059342
Cost after epoch 1400: 0.052289
Parameters have been trained
Train Accuracy: 0.9990741
Test Accuracy: 0.725

Think about the session as a block of code to train the model. Each time you run the session on a minibatch, it trains the parameters. In total you have run the session a large number of times (1500 epochs) until you obtained well trained parameters.

We can also try to make this algorithm work better using L2 or dropout regularization to reduce overfitting.

Source Code

Sources

  1. Deep Learning Specialization by Andrew Ng & team.