Source: Deep Learning on Medium


An autoencoder, also known as autoassociator or Diabolo networks, is an artificial neural network employed to recreate the given input.

It takes a set of **unlabeled** inputs, encodes them and then tries to extract the most valuable information from them.

They are used for feature extraction, learning generative models of data, dimensionality reduction and can be used for compression.

A 2006 paper named Reducing the Dimensionality of Data with Neural Networks, done by G. E. Hinton and R. R. Salakhutdinov, showed better results than years of refining other types of network, and was a breakthrough in the field of Neural Networks, a field that was “stagnant” for 10 years.

Now, autoencoders, based on Restricted Boltzmann Machines, are employed in some of the largest deep learning applications. They are the building blocks of Deep Belief Networks (DBN).


An example given by Nikhil Buduma can explain the utility of this type of Neural Network with excellence.

Say that you want to extract what feeling the person in a photography is feeling. Using as an example the following 256×256 grayscale picture:

But then we start facing a bottleneck! This image being 256×256 correspond with an input vector of 65536 dimensions! If we used an image produced with conventional cellphone cameras, that generates images of 4000 x 3000 pixels, we would have 12 million dimensions to analyse.

This bottleneck is further problematized as the difficulty of a machine learning problem is increased as more dimensions are involved. According to a 1982 study by C.J. Stone, the time to fit a model, at best, is:



m: Number of data points

d: Dimensionality of the data

p: Parameter that depends on the model

As you can see, it increases exponentially!

Returning to our example, we don’t need to use all of the 65,536 dimensions to classify an emotion. A human identify emotions according to some specific facial expression, some **key features**, like the shape of the mouth and eyebrows.

Key Features


An autoencoder can be divided in two parts, the **encoder** and the **decoder**.

The encoder needs to compress the representation of an input. In this case we are going to compress the face of our actor, that consists of 2000 dimensional data to only 30 dimensions, taking some steps between this compression.

The decoder is a reflection of the encoder network. It works to recreate the input, as closely as possible. It has an important role during training, to force the autoencoder to select the most important features in the compressed representation.


After the training has been done, you can use the encoded data as a reliable dimensionally-reduced data, applying it to any problems that a dimensionality reduction problem seem to fit.

This image was extracted from the Hinton paper comparing the two-dimensional reduction for 500 digits of the MNIST, with PCA on the left and autoencoder on the right. We can see that the autoencoder provided us with a better separation of data.


We are going to use the MNIST dataset for our example.

from __future__ import division, print_function, absolute_import

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“/tmp/data/”, one_hot=True)

Now, let’s give the parameters that are going to be used by our NN.

earning_rate = 0.01
training_epochs = 20
batch_size = 256
display_step = 1
examples_to_show = 10

# Network Parameters
n_hidden_1 = 256 # 1st layer num features
n_hidden_2 = 128 # 2nd layer num features
n_input = 784 # MNIST data input (img shape: 28*28)

# tf Graph input (only pictures)
X = tf.placeholder(“float”, [None, n_input])

weights = {
‘encoder_h1’: tf.Variable(tf.random_normal([n_input, n_hidden_1])),
‘encoder_h2’: tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
‘decoder_h1’: tf.Variable(tf.random_normal([n_hidden_2, n_hidden_1])),
‘decoder_h2’: tf.Variable(tf.random_normal([n_hidden_1, n_input])),
biases = {
‘encoder_b1’: tf.Variable(tf.random_normal([n_hidden_1])),
‘encoder_b2’: tf.Variable(tf.random_normal([n_hidden_2])),
‘decoder_b1’: tf.Variable(tf.random_normal([n_hidden_1])),
‘decoder_b2’: tf.Variable(tf.random_normal([n_input])),

Now we need to create our encoder. For this, we are going to use sigmoidal functions. Sigmoidal functions continue to deliver great results with this type of networks. This is due to having a good derivative that is well-suited to backpropagation. We can create our encoder using the sigmoidal function like this:

# Building the encoder
def encoder(x):
# Encoder first layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights[‘encoder_h1’]),
# Encoder second layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights[‘encoder_h2’]),
return layer_2

And the decoder:

You can see that the layer_1 in the encoder is the layer_2 in the decoder and vice-versa.

# Building the decoder
def decoder(x):
# Decoder first layer with sigmoid activation #1
layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights[‘decoder_h1’]),
# Decoder second layer with sigmoid activation #2
layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights[‘decoder_h2’]),
return layer_2

Let’s construct our model.

In the variable `cost` we have the loss function and in the `optimizer` variable we have our gradient used for backpropagation.

# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)

# Prediction
y_pred = decoder_op
# Targets (Labels) are the input data.
y_true = X

# Define loss and optimizer, minimize the squared error
cost = tf.reduce_mean(tf.pow(y_true — y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

The training will run for 20 epochs.

# Launch the graph
# Using InteractiveSession (more convenient while using Notebooks)
sess = tf.InteractiveSession()

total_batch = int(mnist.train.num_examples/batch_size)
# Training cycle
for epoch in range(training_epochs):
# Loop over all batches
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c =[optimizer, cost], feed_dict={X: batch_xs})
# Display logs per epoch step
if epoch % display_step == 0:
print(“Epoch:”, ‘%04d’ % (epoch+1),
“cost=”, “{:.9f}”.format(c))

print(“Optimization Finished!”)

Now, let’s apply encode and decode for our tests.

# Applying encode and decode over test set
encode_decode =
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})

Let’s simply visualize our graphs!

# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[1][i].imshow(np.reshape(encode_decode[i], (28, 28)))

As you can see, the reconstructions were successful. It can be seen that some noise was added to the image.

The complete code can be downloaded here