Getting Started with TensorFlow

Original article was published by Bhanu Soni on Artificial Intelligence on Medium

Getting Started with

The mathematical concept of a tensor could be broadly explained in the following way: If a scalar has the lowest dimensionality and is followed by a vector and then by a matrix. A tensor would be the next object in the line. Scalar, vectors and matrices are all tensors of rank 0, 1 and 2 respectively. Tensors are simply a generalization of the concepts we have seen so far.

How TensorFlow Works

At first, computation in TensorFlow may seem needlessly complicated. But there is a reason for it: because of how TensorFlow treats computation, developing more complicated algorithms is relatively easy. We will look into the pseudocode of a TensorFlow algorithm.

How to do it…

we will introduce the general flow of TensorFlow algorithms.

  1. Import or generate datasets: All of our machine-learning algorithms will depend on datasets. We will either generate data or use an outside source of datasets. Sometimes it is better to rely on generated data because we will just want to know the expected outcome.
  2. Transform and normalize data: Normally, input datasets do not come in the shape TensorFlow would expect so we need to transform TensorFlow them to the accepted shape. We will have to transform our data before we can use it. Most algorithms also expect normalized data and we will do this here as well. TensorFlow has built-in functions that can normalize the data for you as follows:
    data = tf.nn.batch_norm_with_global_normalization(…)
  3. Partition datasets into train, test, and validation set: We generally want to test our algorithms on different sets that we have trained on.
  4. Set algorithm parameters (hyperparameters): Our algorithms usually have a set of parameters that we hold constant throughout the procedure. For example, this can be the number of iterations, the learning rate. It is considered good form to initialize these together so the reader or user can easily find them, as follows:
    learning_rate = 0.01
    batch_size = 100
    iterations = 1000
  5. Initialize variables and placeholders: TensorFlow depends on knowing what it can and cannot modify. TensorFlow will modify/adjust the variables and weight/bias during optimization to minimize a loss function. To accomplish this, we feed in data through placeholders. We need to initialize both of these variables and placeholders with size and type so that TensorFlow knows what to expect. TensorFlow also needs to know the type of data to expect.

a_var = tf.constant(42)
x_input = tf.placeholder(tf.float32, [None, input_size])
y_input = tf.placeholder(tf.float32, [None, num_classes])

6. Define the model structure: After we have the data, and have initialized our variables and placeholders, we have to define the model. This is done by building a computational graph. TensorFlow chooses what operations and values must be the variables and placeholders to arrive at our model outcomes.

y_pred = tf.add(tf.mul(x_input, weight_matrix), b_matrix)

7. Declare the loss functions: After defining the model, we must be able to evaluate the output. This is where we declare the loss function. The loss function is very important as it tells us how far off our predictions are from the actual values.

loss = tf.reduce_mean(tf.square(y_actual — y_pred))

8.Initialize and train the model: Now that we have everything in place, we need to create an instance of our graph, feed in the data through the placeholders, and let TensorFlow change the variables to better predict our training data.

with tf.Session(graph=graph) as session:…)

Note that we can also initiate our graph with:
session = tf.Session(graph=graph)…)

9. Evaluate the model: Once we have built and trained the model, we should evaluate the model by looking at how well it does with new data through some specified criteria. We evaluate on the train and test set and these evaluations will allow us to see if the model is underfit or overfit.

10. Tune hyperparameters: Most of the time, we will want to go back and change some of the hyperparameters, based on the model performance. We then repeat the previous steps with different hyperparameters and evaluate the model on the validation set.

11. Deploy/predict new outcomes: It is also important to know how to make predictions on new, unseen, data. We can do this with all of our models, once we have them trained.

Declaring Tensors

Tensors are the primary data structure that TensorFlow uses to operate on the computational graph. We can declare these tensors as variables and or feed them in as placeholders. First, we must know how to create tensors.

When we create a tensor and declare it to be a variable, TensorFlow creates several graph structures in our computation graph. It is also important to point out that just by creating a tensor, TensorFlow is not adding anything to the computational graph. TensorFlow does this only after creating available out of the tensor.

How to do it…

Here we will cover the main ways to create tensors in TensorFlow:

  1. Fixed tensors:
    A) Create a zero filled tensor. Use the following:
    zero_tsr = tf.zeros([row_dim, col_dim])

    B) Create a one filled tensor. Use the following:
    ones_tsr = tf.ones([row_dim, col_dim])

    C) Create a constant filled tensor. Use the following:
    filled_tsr = tf.fill([row_dim, col_dim], 42)

    D) Create a tensor out of an existing constant. Use the following:
    constant_tsr = tf.constant([1,2,3])

  2. Tensors of similar shape: We can also initialize variables based on the shape of other tensors, as follows:
    zeros_similar = tf.zeros_like(constant_tsr)
    ones_similar = tf.ones_like(constant_tsr)

Note, that since these tensors depend on prior tensors, we must
initialize them in order. Attempting to initialize all the tensors all at
once will result in an error.

3. Sequence tensors:
TensorFlow allows us to specify tensors that contain defined intervals.
The following functions behave very similarly to the range() outputs and
numpy’s linspace() outputs. See the following function:

linear_tsr = tf.linspace(start=0, stop=1, start=3)

The resulting tensor is the sequence [0.0, 0.5, 1.0] . Note that this
function includes the specified stop value. See the following function:

integer_seq_tsr = tf.range(start=6, limit=15, delta=3)

The result is the sequence [6, 9, 12]. Note that this function does not include the limit value

4. Random tensors:

  1. The following generated random numbers are from a uniform distribution:
    randunif_tsr = tf.random_uniform([row_dim, col_dim],
    minval=0, maxval=1)
  2. Note that this random uniform distribution draws from the interval that
    includes the minval but not the maxval ( minval <= x < maxval ).
  3. To get a tensor with random draws from a normal distribution, as follows:
    randnorm_tsr = tf.random_normal([row_dim, col_dim],
    mean=0.0, stddev=1.0)
  4. There are also times when we wish to generate normal random values that
    are assured within certain bounds. The truncated_normal() function always picks normal values within two standard deviations of the specified
    mean. See the following:
    runcnorm_tsr = tf.truncated_normal([row_dim, col_dim], mean=0.0, stddev=1.0)
  5. We might also be interested in randomizing entries of arrays. To accomplish this, there are two functions that help us: random_shuffle() and random_crop() . See the following:
    shuffled_output = tf.random_shuffle(input_tensor)
    cropped_output = tf.random_crop(input_tensor, crop_size)
  6. We mibght be interested in randomly cropping an image of size (height, width, 3) where there are three color spectrums. To fix a dimension in the cropped_output , you must give it the maximum size in that dimension:
    cropped_image = tf.random_crop(my_image, [height/2, width/2,3])

Once we have decided on how to create the tensors, then we may also create the corresponding variables by wrapping the tensor in the Variable() function, as follows

my_var = tf.Variable(tf.zeros([row_dim, col_dim]))

We are not limited to the built-in functions. We can convert any numpy array to a Python list, or constant to a tensor using the function convert_to_tensor() . Note that this function also accepts tensors as an input in case we wish to generalize a computation inside a function.

Using Placeholders and Variables

Placeholders and variables are key tools for using computational graphs in TensorFlow. We must understand the difference and when to best use them to our advantage.

One of the most important distinctions to make with the data is whether it is a placeholder or a variable. Variables are the parameters of the algorithm and TensorFlow keeps track of how to change these to optimize the algorithm. Placeholders are objects that allow you to feed in data of a specific type and shape and depend on the results of the computational graph, such as the expected outcome of a computation.

How to do it…

The main way to create a variable is by using the Variable() function, which takes a tensor as an input and outputs a variable. This is the declaration and we still need to initialize the variable. Initializing is what puts the variable with the corresponding methods on the computational graph. Here is an example of creating and initializing a variable:
my_var = tf.Variable(tf.zeros([2,3]))
sess = tf.Session()
initialize_op = tf.global_variables_initializer ()

  1. Placeholders are just holding the position for data to be fed into the graph.
  2. Placeholders get data from a feed_dict argument in the session. To put a placeholder in the graph, we must perform at least one operation on the placeholder.
  3. We initialize the graph, declare x to be a placeholder, and define y as the identity operation on x, which just returns x.
  4. We then create data to feed into the x placeholder and run the identity operation. It is worth noting that TensorFlow will not return a self-referenced placeholder in the feed dictionary. The code is shown here and the resulting graph is shown in the next section.
sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[2,2])
y = tf.identity(x)
x_vals = np.random.rand(2,2), feed_dict={x: x_vals})
# Note that, feed_dict={x: x_vals}) will result in a self-
referencing error.

How it works…

The computational graph of initializing a variable as a tensor of zeros is shown in the following figure:

we can see what the computational graph looks like in detail with just one
variable, initialized to all zeros. The grey shaded region is a very detailed view of the operations and constants involved. The main computational graph with less detail is the smaller graph outside of the grey region in the upper right corner.

Similarly, the computational graph of feeding a NumPy array into a placeholder can be seen in the following figure:

  1. During the run of the computational graph, we have to tell TensorFlow when to initialize the variables we have created.
  2. TensorFlow must be informed about when it can initialize the variables.
  3. While each variable has an initializer method, the most common way to do
    this is to use the helper function, which is global_variables_initializer(). This function creates an operation in the graph that initializes all the variables we have created, as follows:
    initializer_op = tf.global_variables_initializer ()
  4. But if we want to initialize a variable based on the results of initializing another variable, we have to initialize variables in the order we want, as follows:
    sess = tf.Session()
    first_var = tf.Variable(tf.zeros([2,3]))
    second_var = tf.Variable(tf.zeros_like(first_var))
    # Depends on first_var

Working with Matrices

Understanding how TensorFlow works with matrices is very important to understanding the flow of data through computational graphs.

Many algorithms depend on matrix operations. TensorFlow gives us easy-to-use operations to perform such matrix calculations. For all of the following examples, we can create a graph session by running the following code:
import tensorflow as tf
sess = tf.Session()

How to do it…

  1. Creating matrices: We can create two-dimensional matrices from NumPy arrays or nested lists, as we described in the earlier section on tensors. We can also use the tensor creation functions and specify a two-dimensional shape for functions such as zeros() , ones() , truncated_normal() , and so on. TensorFlow also allows us to create a diagonal matrix from a one-dimensional array or list with the function diag(), as follows:

identity_matrix = tf.diag([1.0, 1.0, 1.0])
A = tf.truncated_normal([2, 3])
B = tf.fill([2,3], 5.0)
C = tf.random_uniform([3,2])
D = tf.convert_to_tensor(np.array([[1., 2., 3.],[-3., -7.,
-1.],[0., 5., -2.]]))
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
[[ 0.96751703 0.11397751 -0.3438891 ]
[-0.10132604 -0.8432678
[[ 5. 5. 5.]
[ 5. 5. 5.]]
[[ 0.33184157 0.08907614]
[ 0.53189191 0.67605299]
[ 0.95889051 0.67061249]]

[[ 1. 2. 3.]
[-3. -7. -1.]
[ 0. 5. -2.]]

Note that if we were to run again, we would reinitialize
the random variables and end up with different random values.

2) Addition and subtraction uses the following function:
[[ 4.61596632 5.39771316 4.4325695 ]
[ 3.26702736 5.14477345 4.98265553]]
[[ 0. 0. 0.]
[ 0. 0. 0.]]

print(, identity_matrix)))
[[ 5. 5. 5.]
[ 5. 5. 5.]]

3) Also, the function matmul() has arguments that specify whether or not to transpose the arguments before multiplication or whether each matrix is sparse.

4) Transpose the arguments as follows:
[[ 0.67124544 0.26766731 0.99068872]
[ 0.25006068 0.86560275 0.58411312]]

5) Again, it is worth mentioning the reinitializing that gives us different values than before.

6) For the determinant, use the following:

[[-0.5 -0.5 -0.5]
[ 0.15789474 0.05263158 0.21052632]
[ 0.39473684 0.13157895 0.02631579]]

Note that the inverse method is based on the Cholesky decomposition if the matrix is symmetric positive definite or the LU decomposition otherwise.

7) Decompositions:
For the Cholesky decomposition, use the following:
[[ 1. 0. 1.]
[ 0. 1. 0.]
[ 0. 0. 1.]]

8) For Eigenvalues and eigenvectors, use the following code:


[[-10.65907521 -0.22750691 2.88658212]
[ 0.21749542 0.63250104 -0.74339638]
[ 0.84526515 0.2587998 0.46749277]
[ -0.4880805 0.73004459 0.47834331]]

Note that the function self_adjoint_eig() outputs the eigenvalues in the first row and the subsequent vectors in the remaining vectors. In mathematics, this is known as the Eigen decomposition of a matrix.

How it works…

TensorFlow provides all the tools for us to get started with numerical computations and adding such computations to our graphs. This notation might seem quite heavy for simple matrix operations. Remember that we are adding these operations to the graph and telling TensorFlow what tensors to run through those operations.

Implementing Activation Functions

When we start to use neural networks, we will use activation functions regularly because activation functions are a mandatory part of any neural network. The goal of the activation function is to adjust weight and bias. In TensorFlow, activation functions are non-linear operations that act on tensors. They are functions that operate in a similar way to the previous mathematical operations. Activation functions serve many purposes, but a few main concepts is that they introduce a non-linearity into the graph while normalizing the outputs. Start a TensorFlow graph with the following commands:

import tensorflow as tf
sess = tf.Session()

How to do it…

The activation functions live in the neural network (nn) library in TensorFlow. Besides using built-in activation functions, we can also design our own using TensorFlow operations. We can import the predefined activation functions ( import tensorflow.nn as nn ) or be explicit and write .nn in our function calls. Here, we choose to be explicit with each function call:

  1. The rectified linear unit, known as ReLU, is the most common and basic way to introduce a non-linearity into neural networks. This function is just max(0,x) . It is continuous but not smooth. It appears as follows:
    print([-3., 3., 10.])))
    [ 0. 3. 10.]
  2. There will be times when we wish to cap the linearly increasing part of the preceding ReLU activation function. We can do this by nesting the max(0,x) function into a min() function. The implementation that TensorFlow has is called the ReLU6 function. This is defined as min(max(0,x),6) .

print([-3., 3., 10.])))
[ 0. 3. 6.]

3. The sigmoid function is the most common continuous and smooth activation function. It is also called a logistic function and has the form 1/(1+exp(-x)). The sigmoid is not often used because of the tendency to zero-out the backpropagation terms during training. It appears as follows:
print([-1., 0., 1.])))
[ 0.26894143 0.5 0.7310586 ]

We should be aware that some activation functions are not zero centred, such as the sigmoid. This will require us to zero-mean the data prior to using it in most computational graph algorithms.

4. Another smooth activation function is the hyper tangent. The hyper tangent function is very similar to the sigmoid except that instead of having a range between 0 and 1, it has a range between -1 and 1. The function has the form of the ratio of the hyperbolic sine over the hyperbolic cosine. But another way to write this is ((exp(x)- exp(-x))/(exp(x)+exp(-x)). It appears as follows:
print([-1., 0., 1.])))
[-0.76159418 0. 0.76159418 ]

5. The softsign function also gets used as an activation function. The form of this function is x/(abs(x) + 1). The softsign function is supposed to be a continuous approximation to the sign function. It appears as follows:
print([-1., 0., -1.])))
[-0.5 0. 0.5]

6. Another function, the softplus , is a smooth version of the ReLU function. The form of this function is log(exp(x) + 1). It appears as follows:
print([-1., 0., -1.])))
[ 0.31326166 0.69314718 1.31326163]

The softplus goes to infinity as the input increases whereas the softsign goes to 1. As the input gets smaller, however, the softplus approaches zero and the softsign goes to -1.

7. The Exponential Linear Unit (ELU) is very similar to the softplus function except that the bottom asymptote is -1 instead of 0 . The form is (exp(x)+1) if x < 0 else x. It appears as follows:
print([-1., 0., -1.])))
[-0.63212055 0. 1. ]

How it works…

These activation functions are the way that we introduce nonlinearities in neural networks or other computational graphs in the future. It is important to note where in our network we are using activation functions. If the activation function has a range between 0 and 1 ( sigmoid ), then the computational graph can only output values between 0 and 1.

If the activation functions are inside and hidden between nodes, then we want to be aware of the effect that the range can have on our tensors as we pass them through. If our tensors were scaled to have a mean of zero, we will want to use an activation function that preserves as much variance as possible around zero. This would imply we want to choose an activation function such as the hyperbolic tangent (tanh) or softsign . If the tensors are all scaled to
be positive, then we would ideally choose an activation function that preserves variance in the positive domain.


We covered

  1. How TensorFlow Works
  2. Declaring Variables and Tensors
  3. Using Placeholders and Variables
  4. Working with Matrices
  5. Implementing Activation Functions

Google’s TensorFlow engine has a unique way of solving problems. This unique way allows us to solve machine learning problems very efficiently. Machine learning is used in almost all areas of life and work, but some of the more famous areas are computer vision, speech recognition, language translations, and healthcare.

In the next article, we will cover the key components of how TensorFlow operates.!!!