Introduction to Tensorflow Estimators (Part 1)

Tensorflow is an open source numerical computing library for implementing production-ready machine learning models as well as experimenting with novel architectures released by Google. It was originally developed by Google Brain team. Since it’s release it has been steadily taking over other deep learning libraries like Theano or Caffe.

Several high level user-friendly deep learning libraries like Keras, TfLearn have been built on top of tensorflow already. It’s flexible architecture allows users to create and deploy machine learning and deep learning models in CPU, GPU, distributed machines and even mobile devices. It also supports interfaces for many programming languages including Python and C++. This tutorial will use Python though.

You can use Tensorflow Playground to see an implementation of a neural network in playground for practical experience right now if you like. Google is using Tensorflow for search ranking, computer vision[Inception model], speech recognition, Youtube recommendations, machine translation for Google translate and in many other areas.

In this article , we’ll explore tensorflow and work on a regression problem to predict Airbnb rental listing prices from Boston Airbnb Open Data. We’ll learn about basic concepts of tensorflow like Tensors and computational graph, learn how to execute simple programs and implement a linear regression model from scratch first. Then we will learn how to use the high level estimator API for quickly building and testing models on the Airbnb data.

Airbnb is an online marketplace that helps peope to lease or rent short-term lodging including vacation and apartment rentals, homestays, hotels.The data is publicly released in Kaggle. This dataset is part of Airbnb Inside, and the original source can be found here.

Visualization from popular ‘Deepdrem’ project,implemented with tensorflow, Image credit- Inceptionism, going deeper into neural networks, Google research blog

Installation

Tensorflow can be installed by pip using pip install — upgrade tensorflow command in terminal. We are using tensorflow 1.4(Dec 2017 latest release) for this tutorial. We don’t have to get the GPU version of tensorflow for this tutorial because the dataset we’ll be using is small, so the CPU version is adequate.

Note that in Windows tensorflow only supports 64-bit python 3.5.x and it does not support python 3.6 or 2.7. I’m using Anaconda(with python 3.5) because it comes with many important data science libraries and jupyter notebook.

To learn how to get tensorflow for your specific OS here are the official installation guides.

The guides contain instructions on how to install tensorflow using pip, virtualenv, Docker and from source. For other information see the installation home page.

Prerequisites

Since it’s an introductory article, this tutorial does not expect previous experience with tensorflow. However it does expect basic programming skills in python, knowledge of general machine learning workflow related concepts such as feature preprocessing, loss functions, model training and evaluation etc.

We will use pandaslibrary for data preprocessing and some scikit-learn helper functions here, but we’ll explain them as we go.

Tensorflow fundamentals

Machine learning is advancing at a rapid rate and to remain relevant a good machine learning framework has to hit the balance between flexibility and simplicity. To implement novel architectures created by researchers, ML frameworks should be extendable and flexible, but regular users often want built in models which they can readily use on their own dataset.

So, Tensorflow has to deal with many different classes of userbase with varied interests, users who want to build their custom models, users who want to use common models and users who don’t care much about the specifics of the model, but wants to integrate the results in their own code infrastructure.

In order to handle these different user demographics, Tensorflow provides multiple API’s in different level of abstraction. The high level API’s like Keras, tf.estimator are built on top of the tensorflow core functionalities. Tensorflow estimators have been integrated with the core Tensorflow like Keras.

For users who just want to use the common models, Tensorflow provides pre-made estimators or “Canned Estimators” which refer to implementations of common machine learning models. We will use the canned estimators in this tutorial to predict airbnb rental prices with linear regression and learn how to use the estimator API. However, before using estimators we’ll go through the basic building blocks in tensorflow.

We can see how different parts fit into tensorflow architecture in this diagram from Google developer blog post.

Tensors and Computational Graphs

Tensors are the central unit of data in tensorflow. Tensors are like numpy arrays, we can conceptually think of them as the n-dimensional abstraction of matrices. A zero-dimensional tensor is a scalar or a constant. A 1-dimensional tensor is a list or vector. A 2-D tensor is same as a n x m matrix where n = rows and m = columns. Above that we can just say n-dimensional tensors. For example,
a = 3 (treated as 0 dimensional tensor or scalars)
a = [3,5] (treated as 1D tensor or vector)
a = [[3,5],[1,1]] (treated as 2D tensor or a matrix)

These tensors are passed to operations that perform computations on them. An ‘operation’ is commonly known as an ‘op’ . Operations take zero or more tensors as inputs, performs computation with them and outputs zero or more tensors. For example, an operation like tf.add may take two inputs 3 and 5 and output their summation 8.

The tensors and the operations are connected to each other in a computational graph. A computational graph is defined by considering the operations as nodes and the tensors as edges. The operations are mathematical operations that are done on data and the data is passed to the operations with the tensors.

We can visualize a computational graph like this.

Here we have two input tensors that pass the constants 5 and 3 to the add operation and the operation outputs 8.

We’ll code it in the following section.

Constants and Running a Session :

import tensorflow as tf
print(tf.__version__)

We will define two constant tensors a and b with tf.constant with constants 5 and 3 and add them up with tf.add as shown in the computational graph.

a = tf.constant(5,name = "a")
b = tf.constant(3, name = "b")
result = tf.add(a,b,name='add_a_b')
result
# Output
<tf.Tensor 'add_a_b:0' shape=() dtype=int32>

Unfortunately enough, our code has not produced the expected output. We can think of tensorflow core programs as having two distinct sections, first we have to define a computational graph that specifies the computations we want to do, then we have to run the code to get our actual results. We have defined our computational graph in this case, but we have not run the graph yet.

To evaluate result and get the output we have to run the code under a ‘session’. A session takes a computational graph or part of a graph and executes it. It also holds the intermediate values and the results of performing the computation. We can create an instance of a session object from tf.Session class.

Following code creates a session, and evaluates the output.

sess = tf.Session()
sess.run(result)
# Output 
8

Variables and Placeholders

Since in machine learning we want to update the paramaters of the models when training simply using constants whose values don’t change is not enough, we need some mechanism to add trainable parameters to the computational graph. In Tensorflow we accomplish it using variables. Variables require us to specify an initial value and the data type. We create variables with the tf.variable op.

A graph can also be fed external inputs using placeholders so that we can feed arbitrary number of inputs from the training sets to the model. Placeholders act like tensor objects that do not have their values specified and are created using the op tf.placeholder.

Placeholder values must be fed when we run them. We can use sess.run method’s feed_dict argument to feed the values to the placeholders. We have to specifiy the shape and datatype of the placeholder when we add them to the graph.A shape of None indicates that the placeholder can get any arbitrary input.

feed_dict is just a dictionary that maps graph elements like variables,constants or placeholders to values. We use it to overwrite the existing values of tensors. We can also use it to change variable values when running them.

c = tf.Variable(3,dtype=tf.float32)
d = tf.placeholder(dtype = tf.float32,shape=None)

Unlike constants, variables are not initialized when we call tf.Variable. We’ll have to run a special operation called tf.global_variables_initializer to initialize the variables by a session.

sess.run(tf.global_variables_initializer())
print(sess.run(c,feed_dict = {c:14}))
print(sess.run(d,feed_dict = {d:[1,1,3,5]}))
# OUTPUT
14.0
[ 1. 1. 3. 5.]

The variable c was initialized with 3, but we have changed it to 14 with the feed_dict parameter. The placeholders didn’t have any specified value when we initialized it, but we fed it a list of values when we ran the code.

Tensorboard

Tensorboard is a visualization tool that comes packaged with tensorflow.It’s very useful to visualize large scale machine learning models to debug them and understand what’s going on under the hood. With tensorboard we can also track our loss metrics and other values to see how they are changing over training steps.

For using tensorboard, we can save our graphs with by writing summaries about them with summary writers. Summaries are like condensed information about models. Tensorboard creates visualizations out of this information.

We have to pass the directory name where our graph log files will be saved and the computational graph we want to save into the summary writer object when calling it.. sess.graph contains the default computational graph for this session and writer writes it into the directory provided in logdir parameter.

writer = tf.summary.FileWriter(logdir= "../first_graph",graph = sess.graph)
writer.close()

To run TensorBoard, use the following command in the terminal of Linux or MACOS.

tensorboard –logdir=path/to/log-directory

In windows, use tensorboard –logdir path/to/log-directory.

Here logdir is ../first_graph, so we can simply type

tensorboard –logdir=first_graph

Tensorboard’s default port is 6006. So if you go to http://localhost:6006/#graphs tensorboard will be there.

If we go under the graphs section we can see this visualization of our tiny computational graph. The node is a mathematical operation sum that takes in two inputs a and b.

NOTE : Delete the summary folder if you want to re-run the code to run from a clean state.

Linear Regression From Scratch :

Linear regression is a simple but powerful commonly used technique used in both statistics and machine learning . It’s a ‘linear’ model, in other words it assumes a linear relationship between the inputs(x) and the outputs(y). Formally, the model assumes that we can get the output value using a linear combination of the input values.

In a simple dataset with only one feature and one output to predict, the form of the equation looks like

We can see that for different values of input X we can get the predictions by using the equation. We can also visualize it.

We try to find out the best possible value for the weight and bias parameters using optimization technique against a loss function in order to fit a line through the data using the weight and the bias parameter when it comes to single feature.Loss functions tell us how good our predicted value is compared to the actual output. If we have n features the general equation is like this.

Here we are going to implement linear regression from scratch for a random dataset of one feature. First we are going to create our model parameters, W and b which stands for weight and bias.

# Model Parameters
W = tf.Variable([3.0],name='weight')
b = tf.Variable([-2.0],name='bias')

We create two placeholders X and y for the training set and labels. We’ll feed the inputs to them during training.

# Model inputs
# training data
X = tf.placeholder(tf.float32)
# y
Y = tf.placeholder(tf.float32)

We define the model like the equations specified above for a single feature.

# Model definitation
predictions = W*X + b

We’ll use sum of squared errors as our loss function. To implement it, for each instance we calculate the error first. Error is the difference between the prediction we get from the model and the original values we were supposed to get. Then we square the error and add them up.

Since we are summing the squared errors or residuals(another name for errors) it’s also called the OLS or ordinary least squares method. Note that errors are calculated for each sample or instance, while the loss tells us how good our model is performing on the whole dataset.

# loss function. Here we use sum of squared errors.
loss = tf.reduce_sum(tf.square(predictions-Y))

Optimizers are used for finding the best value for some parameters with respect to some loss function in machine learning models. There’s many kind of optimizers, the one we are using here is called Gradient Descent.We start with a random value for the weight and the bias. The optimizer updates the weight and the bias parameter in a direction(increasing or decreasing the numbers) to minimize the value of the loss. We also provide a learning rate to use as a scale factor while updating.

# training op
train = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

We create some random values for training set and label and feed them into the model during running the code. As before we initialize all our variables before running them usin global_variables_initializer. We run the model for 2000 steps here.

# train data
x = [1.1,2.0,3.5,4.8]
y = [2.0,3.4,4.2,5.1]
sess.run(tf.global_variables_initializer())
for train_step in range(2000):
sess.run(train,{X:x,Y:y})

We can see the final results after executing the model.

weight, bias, loss = sess.run([W, b, loss], {X:x, Y:y})
print("W: %s b: %s loss: %s"%(weight,bias,loss))
# Output 
W: [ 0.84224832] b: [ 1.23781681] loss: 0.288

The next part will cover the tensorflow estimators.

NOTE : This is part 1 of a two part article that I originally wrote as a part of dataquest community writers program. However as it is it apparently does not meet the criterias and I feel like there’s significant difference between how I want to write vs the program expectations so I’ve quit the program. I’ve no idea if they will ever publish it or not either. I don’t like the idea of the article forever rotting though, so I’ve decided to publish it.


Introduction to Tensorflow Estimators (Part 1) was originally published in Learning Machine Learning on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Deep Learning on Medium