Deep Learning: Solving Problems With TensorFlow

Source: Deep Learning on Medium

Deep Learning: Solving Problems With TensorFlow

Learn how to Solve Optimization Problems and Train your First Neural Network with the MNIST Dataset!


The goal of this article is to define and solve pratical use cases with TensorFlow. To do so, we will solve:

  • An optimization problem
  • A linear regression problem, where we will adjust a regression line to a dataset
  • And we will end solving the “Hello World” of Deep Learning classification projects with the MINST Dataset.

Optimization Problem

Netflix has decided to place one of their famous posters in a building. The marketing team has decided that the advertising poster has to cover an area of 600 square meters, with a margin of 2 meters above and below and 4 meters left and right.

However, they have not been informed of the dimensions of the building’s facade. We could send an email to the owner and ask him, but as we know mathematics we can solve it easily. How can we find out the dimensions of the building?

The total area of the building is:

Width = 4 + x + 4 = x +8

Height = 2 + y + 2 = y +4

Area = Width x Height = (x + 8)*(y + 4)

And there is the constraint of: x*y = 600

This allows us to write an equation system:

xy = 600 → x = 600/y

S(y) = (600/y + 8)(y + 4) = 600 +8y +4*600/y +32 = 632 + 8y + 2400/y

In an optimization problem, the information of the slope of the function, (the derivative) is used to calculate its minimum. We have to equal the first derivative to 0 and then check that the second derivative is positive. So, in this case:

S’(y) = 8–2400/y²

S’’(y) = 4800/y³

S’(y) = 0 → 0 = 8–2400/y² → 8 = 2400/y² → y² = 2400/8 = 300 → y = sqrt(300) = sqrt(100–3) = sqrt(100)-sqrt(3) = 10-sqrt(3) = 17.32 (we discard the negative sign because it has no physical meaning)

Substituting in x:

x = 600 / 10-sqrt(3) = 60 / sqrt(3) = 60-sqrt(3) / sqrt(3)-sqrt(3) = 60-sqrt(3) / 3 = 20-sqrt(3) = 34.64

As for y = 17.32 -> S’’(y) = 0.9238 > 0, we have found the minimum solution.

Therefore, the dimensions of the building are:

Width: x + 8 = 42.64 m

Height: y + 4 = 21.32 m

Have you seen how useful derivatives are? We just solved this problem analytically. We have been able to solve it because it was a simple problem, but there are many problems for which it is very computationally expensive to solve them analytically, so we use numerical methods. One of these methods is Gradient Descent.

What do you say if we solve this problem this time numerically with Tensorflow? Let’s go!

import numpy as np
import tensorflow as tf
x = tf.Variable(initial_value=tf.random_uniform([1], 34, 35),name=’x’)
y = tf.Variable(initial_value=tf.random_uniform([1], 0., 50.), name=’y’)
# Loss function
s = tf.add(tf.add(632.0, tf.multiply(8.0, y)), tf.divide(2400.0, y), ‘s’)
opt = tf.train.GradientDescentOptimizer(0.05)
train = opt.minimize(s)
sess = tf.Session()init = tf.initialize_all_variables()
old_solution = 0
tolerance = 1e-4
for step in range(500):
solution =
if np.abs(solution — old_solution) < tolerance:
print(“The solution is y = {}”.format(old_solution))

old_solution = solution
if step % 10 == 0:
print(step, “y = “ + str(old_solution), “s = “ + str(

We have managed to calculate y using the gradient descent algorithm. Of course, we now need to calculate x substituting x = 600/y.

x = 600/old_solution[0]

Which matches our results, so it seems to work! Let’s plot the results:

import matplotlib.pyplot as plty = np.linspace(0, 400., 500)
s = 632.0 + 8*y + 2400/y
plt.plot(y, s)
print("The function minimum is in {}".format(np.min(s)))
min_s = np.min(s)
s_min_idx = np.nonzero(s==min_s)
y_min = y[s_min_idx]
print("The y value that reaches the minimum is {}".format(y_min[0]))

Let’s See other Example

In this case, we want to find the minimum of the y = log2(x) function.

x = tf.Variable(15, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(log_x_squared)
init = tf.initialize_all_variables()def optimize():
with tf.Session() as session:
print("starting at", "x:",, "log(x)^2:",
for step in range(100):
print("step", step, "x:",, "log(x)^2:",


Let’s plot it!

x_values = np.linspace(0,10,100)
fx = np.log(x_values)**2
plt.plot(x_values, fx)
print("The function minimum is in {}".format(np.min(fx)))
min_fx = np.min(fx)
fx_min_idx = np.nonzero(fx==min_fx)
x_min_value = x_values[fx_min_idx]
print("The y value that reaches the minimum is {}".format(x_min_value[0]))

Let’s Solve a Linear Regression Problem

Let’s see how to adjust a straight line to a dataset that represent the intelligence of every character in the Simpson’s show, from Ralph Wigum to Doctor Frink.

Let’s plot the distribution of intelligence against the age, normalized from 0 to 1, where Maggie is the youngest and Montgomery Burns the oldest:

n_observations = 50
_, ax = plt.subplots(1, 1)
xs = np.linspace(0., 1., n_observations)
ys = 100 * np.sin(xs) + np.random.uniform(0., 50., n_observations)
ax.scatter(xs, ys)

Now, we need two tf.placeholders, one to the entry and other to the exit of our regression algorithm. Placeholders are variables that do not need to be assigned a value until the network is executed.

X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

Let’s try to optimizie a straight line of linear regression. We need two variables, the weights (W) and the bias (b). Elements of the type tf.Variable need an initialization and its type cannot be changed after being declared. What we can change is its value, by the “assign” method.

W = tf.Variable(tf.random_normal([1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
Y_pred = tf.add(tf.multiply(X, W), b)

Let’s define now the cost function as the difference between our predictions and the real values.

loss = tf.reduce_mean(tf.pow(Y_pred - y, 2))

We’ll define now the optimization method, we will use the gradient descent. Basically, it calculates the variation of each weight with respect to the total error, and updates each weight so that the total error decreases in subsequent iterations. The learning rate indicates how abruptly the weights are updated.

learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# Definition of the number of iterations and start the initialization using the GPU
n_epochs = 1000
with tf.Session() as sess:
with tf.device("/GPU:0"):
# We initialize now all the defined variables
# Start the adjust
prev_training_loss = 0.0
for epoch_i in range(n_epochs):
for (x, y) in zip(xs, ys):, feed_dict={X: x, Y: y})
W_, b_, training_loss =[W, b, loss], feed_dict={X: xs, Y: ys}) # We print the losses every 20 epochs
if epoch_i % 20 == 0:
# Ending conditions
if np.abs(prev_training_loss - training_loss) < 0.000001:
print(W_, b_)
prev_training_loss = training_loss
# Plot of the result
plt.scatter(xs, ys)
plt.plot(xs, Y_pred.eval(feed_dict={X: xs}, session=sess))

And we have it! With this regression line we will be able to predict the intelligence of every Simpson’s character knowing the age.

MNIST Dataset

Let’s see now how to classify digits images with a logistic regression. We will use the “Hello world” of the Deep Learning datasets.

Let’s import the relevant libraries and the dataset MNIST:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

We load the dataset by encoding the labels with one-hot encoding (it converts each label into a vector of length = N_CLASSES, with all 0s except for the index that indicates the class to which the image belongs, which contains a 1). For example, if we have 10 classes (numbers from 0 to 9), and the label belongs to number 5: label = [0 0 0 0 1 0 0 0 0].

mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)print("Train examples: {}".format(mnist.train.num_examples))
print("Test examples: {}".format(mnist.test.num_examples))
print("Validation examples: {}".format(mnist.validation.num_examples))
# Images are stored in a 2D tensor: images_number x image_pixels_vector
# Labels are stored in a 2D tensor: images_number x classes_number (one-hot)

print("Images Size train: {}".format(mnist.train.images.shape))
print("Images Size train: {}".format(mnist.train.labels.shape))
# To see the range of the images values
print("Min value: {}".format(np.min(mnist.train.images)))
print("Max value: {}".format(np.max(mnist.train.images)))
# To see some images we will acess a vector of the dataset and resize it to 28x28
plt.imshow(np.reshape(mnist.train.images[0, :], (28, 28)), cmap='gray')
plt.imshow(np.reshape(mnist.train.images[27500, :], (28, 28)), cmap='gray')
plt.imshow(np.reshape(mnist.train.images[54999, :], (28, 28)), cmap='gray')

We have already seen a little of what the MNIST dataset consists of. Now, let’s create our regressor:

First, we create the placeholder for our input data. In this case, the input is going to be a set of vectors of size 768 (we are going to pass several images at once to our regressor, this way, when it calculates the gradient it will be swept in several images, so the estimation will be more precise than if it used only one)

n_input = 784 # Number of data features: number of pixels of the image
n_output = 10 # Number of classes: from 0 to 9
net_input = tf.placeholder(tf.float32, [None, n_input]) # We create the placeholder

Let’s define now the regression equation: y = W*x + b

W = tf.Variable(tf.zeros([n_input, n_output]))
b = tf.Variable(tf.zeros([n_output]))

As the output is multiclass, we need a function that returns the probabilities of an image belonging to each of the possible classes. For example, if we put an image with a 5, a possible output would be: [0.05 0.05 0.05 0.05 0.55 0.05 0.05 0.05 0.05] whose sum of probabilities is 1, and the class with the highest probability is 5.

We apply the softmax function to normalize the output probabilities:

net_output = tf.nn.softmax(tf.matmul(net_input, W) + b)

SoftMax Function

# We also need a placeholder for the image label, with which we will compare our prediction And finally, we define our loss function: cross entropy
y_true = tf.placeholder(tf.float32, [None, n_output])
# We check if our prediction matches the label
cross_entropy = -tf.reduce_sum(y_true * tf.log(net_output))
idx_prediction = tf.argmax(net_output, 1)
idx_label = tf.argmax(y_true, 1)
correct_prediction = tf.equal(idx_prediction, idx_label)
# We define our measure of accuracy as the number of hits in relation to the number of predicted samples
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
# We now indicate that we want to minimize our loss function (the cross entropy) by using the gradient descent algorithm and with a rate of learning = 0.01.
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Everything is now set up! Let’s execute the graph:

from IPython.display import clear_outputwith tf.Session() as sess: # Let's train the regressor
batch_size = 10
for sample_i in range(mnist.train.num_examples):
sample_x, sample_y = mnist.train.next_batch(batch_size), feed_dict={net_input: sample_x,
y_true: sample_y})
# Let's check how is performing the regressor
if sample_i < 50 or sample_i % 200 == 0:
val_acc =, feed_dict={net_input: mnist.validation.images, y_true: mnist.validation.labels})
print("({}/{}) Acc: {}".format(sample_i, mnist.train.num_examples, val_acc))
# Let's show the final accuracy
print('Teste accuracy: ',, feed_dict={net_input: mnist.test.images, y_true: mnist.test.labels}))

We have just trained our first NEURONAL NETWORK with TensorFlow!

Think a little bit about what we just did.

We have implemented a logistic regression, with this formula: y = G(Wx + b), where G = softmax() instead of the typical G = sigmoid().

If you look at the following image, which defines the perceptron (a single-layer neural network) you can see as output = Activation_function(Wx). You see? Only the bias is missing! And notice that the input is a 1? So the weight w0 is not multiplied by anything. Exactly! The weight w0 is the bias, which appears with this notation simply to be able to implement it as a matrix multiplication.

So, what we have just implemented is a perceptron, with

  • batch_size = 10
  • 1 epoch
  • descent gradient as optimizer
  • and softmax as activation function.

Final Words

As always, I hope you enjoyed the post, that you have learned how to use TensorFlow to solve linear problems and that you have succesfully trained your first Neural Network!

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here .

If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium, and stay tuned for my next posts!