Implementing a Simple Auto-Encoder in Tensorflow

Source: Deep Learning on Medium


Peele using DeepFake to forge a video of Obama — Source: BuzzFeed, YouTube
Go to the profile of Edoardo Barp

Generative Adversarial Networks (GAN) have recently risen in popularity through the display of some of their capabilities, initially by imitating famous painters’ art styles, but more recently through DeepFake, which allows to seamlessly replace facial expression in videos, while keeping a high output quality.

Source: BuzzFeed, YouTube

One of the pillars of GANs is the use of auto-encoders. An auto-encoder is a neural network with two properties: the input and output data are the same, and the network includes a layer of lower dimension than the input. At first, this might sound confusing and useless, but by training the network to copy the input data, while having a “bottleneck”, what we are really doing is getting the network to learn a “compressed” version of the data, and then 
de-compressing it. In jargon, this translates to finding a “latent space” representation of our data.

In this post, I will explain how to implement an auto-encoder in python, and how to use it in practice to encode and decode data. It is assumed that you have Python 3 as well as Tensorflowalready installed and working, although the code will require minimal changes to work on Python 2.

So, a good auto-encoder must:
1. “Compress” the data, i.e. latent dimension < input dimension
2. Replicate the data well (duh!)
2. Allow us to get the latent representation a.k.a. encoding
3. Allow us to decode an encoded representation

Source: another very interesting article, explaining and using auto-encoder to remove noise from images.

I will show you two ways to do this, the rigorous/pedantic (depends on your point of view really), which uses the low level tensorflowAPI, and the faster and more casual one, which takes advantage of the keras API, although the first one is necessary if you want to properly understand the inner workings of the second.

Tensorflow Low Level Implementation

Only the coolest kids use this one — Bill Gates

This implementation will make direct use of the tensorflow coreAPI, which requires some prerequisite knowledge; I’ll briefly explain three fundamental concepts: the tf.placeholder, tf.Variable,and tf.Tensor.

A tf.placeholder is simply a “variable” which will be an input to the model, but is not part of the model per se. It basically allows us to tell the model it shall expected a variable of a certain type and of a certain dimension. This is very similar to a variable declaration in strong-typed languages.

A tf.Variable is pretty much the same as a variable in other programming language, with a declaration similar to that of most strong-typed languages. The network’s weights, for instance, are tf.Variables.

A tf.Tensor is slightly more complex. In our case, we can think of it as an object which contains the symbolic representation of an operation. For instance, given a placeholder X and a weight variable W, the generic representation of the matrix multiplication W X is a tensor, but the result of it, given a specific value of X and W, is not a tensor.


Now that the introductions are made, the process of building the network is rather simple, but pedantic. We will use the MNIST dataset, which is a dataset of handwritten digits stored as 28×28 pictures. We define D as the input data, in this case the flattened image dimension, 784, and d as the encoding dimension, which I set at 128. The network then has 3 layers of the following dimensions: D, d, D.

We will now implement a simple Auto-encoder class, which will be explained line by line. There’s also a full version of the code available here if you’re interested.

Let’s go! First, we need a placeholder for the input data:

self.X = tf.placeholder(tf.float32, shape=(None, D))

Then, we start the encoding phase, by defining the first layer of weights, with the extra bias, as variables:

self.W1 = tf.Variable(tf.random_normal(shape=(D,d)))
self.b1 = tf.Variable(np.zeros(d).astype(np.float32))

Notice that the shape of the weight is Dxd, going from the higher to lower dimension. Next, we create the tensor for the bottleneck layer, as the multiplication between input and weight, with the bias added, all of which is activated by relu.

self.Z = tf.nn.relu( tf.matmul(self.X, self.W1) + self.b1 )

We then proceed to the decoding phase, which is identical to the encoding but going from lower to higher dimensions.

self.W2 = tf.Variable(tf.random_normal(shape=(d,D)))
self.b2 = tf.Variable(np.zeros(D).astype(np.float32))

Finally, we define the output tensor, as well as the prediction variable. Sigmoid activation is chosen for simplicity since it’s always in the interval [0,1], which is the same range as the normalised pixel from the input.

logits = tf.matmul(self.Z, self.W2) + self.b2 
self.X_hat = tf.nn.sigmoid(logits)

That’s it for the network! We only need a loss function as well as an optimizer, and we’re good to start training. The loss chosen is the sigmoid cross entropy, which implies we’re considering the problem as a binary classification at the pixel level, which makes sense for this dataset of black and white images.

Regarding the optimiser, this is pretty much archaic sorcery. Maybe we should create a model which outputs which optimiser to use, given a problem?

self.cost = tf.reduce_sum(
tf.nn.sigmoid_cross_entropy_with_logits(
# Expected result (a.k.a. itself for autoencoder)
labels=self.X,
logits=logits
)
)

self.optimizer = tf.train.RMSPropOptimizer(learning_rate=0.005).minimize(self.cost)

The last slightly technical term regards the session, an object which acts as context manager and connector with the backend, and which needs to be initialised:

self.init_op = tf.global_variables_initializer() 
if(self.sess == None):
self.sess = tf.Session()
self.sess = tf.get_default_session()
self.sess.run(self.init_op)

That’s it! .. or nearly, now we need to fit the model; but worry not, this is very simple in tensorflow:

# Prepare the batches 
epochs = 10
batch_size = 64
n_batches = len(X) // bs

for i in range(epochs):
# Permute the input data
X_perm = np.random.permutation(X)
for j in range(n_batches):
 # Load data for current batch
batch = X_perm[j*batch_size:(j+1)*batch_size]
 # Run the batch training!
_, costs = self.sess.run((self.optimizer, self.cost),
feed_dict={self.X: batch})

The last line is really the only interesting one. It’s telling tensorflow to run a training step using batch as the placeholder input X, and using the given optimizer and loss function to do the weight update.

Let’s see some examples of reconstructions given by this network:


Now this is all well and good, but at the moment we’ve only trained a network which can reconstruct itself.. how can we actually use the auto-encoder? We need to define two more operations, encode and decode.. which is actually very simple:

def encode(self, X):
return self.sess.run(self.Z, feed_dict={self.X: X})

Here we tell tensorflow to calculate Z which if you look back, you’ll find is the tensor representing the encoding. The decoding is just as straightforward:

def decode(self, Z):
return self.sess.run(self.X_hat, feed_dict={self.Z: Z})

This time, we explicitely give tensorflow the encoding through Z,which we would have previously calculated using the encode function, and we tell it to calculate the predicted output,X_hat .

Now as you can see, this is quite long even for a simple network. Sure we could have parametrised a bit and used lists instead of single variable for every weight, but what happens when we need to test multiple structures quickly or automatically? What about other types of layers than just dense? Worry not, the second (less pedantic) way allows us to do all of that easily!

Keras API implementation

The simpler, the better — William of Ockham

The first way was long, pedantic, and, let’s be honest, a bit annoying. In addition, the lack of simple generalisation did not help. Therefore, a simpler solution is key, which is available through the use of the keras interface, included in tensorflow.

All the network definition, loss, optimisation, and fitting fits in a few lines:

t_model = Sequential()
t_model.add(Dense(256, input_shape=(784,)))
t_model.add(Dense(128, name='bottleneck'))
t_model.add(Dense(784, activation=tf.nn.sigmoid))
t_model.compile(optimizer=tf.train.AdamOptimizer(0.001),
loss=tf.losses.sigmoid_cross_entropy)
t_model.fit(x, x, batch_size=32, epochs=10)

That’s it, we’ve got a trained network. Isn’t life nice sometimes?

…but what about the whole encoding/decoding? Yeah, that’s when it gets slightly trickier, but worry not, your guide is here.
So, to be able to do that, we’re going to need a few things:

  • A session variable
  • The input tensor, to specify the input in thefeed_dict argument
  • The encoded tensor, to retrieve the encoding, and to use as input for the decoding feed_dict argument
  • The decoded/output tensor, to retrieve the decoded value

Abracadabra!

We achieve this by simply getting the required tensors from the layers of interest! Note that by naming the bottleneck layer, I make it very easy to retrieve it.

session = tf.get_default_session()
if(self.sess == None):
self.sess = tf.Session()
# Get input tensor
def get_input_tensor(model):
return model.layers[0].input
# get bottleneck tensor
def get_encode_tensor(model):
return model.get_layer(name='encode').output
# Get output tensor
def get_output_tensor(model):
return model.layers[-1].output

That’s it! Now given a trained model, you can get all the variables you need through these lines:

t_input = get_input_tensor(t_model)
t_enc = get_bottleneck_tensor(t_model)
t_dec = get_output_tensor(t_model)
session = tf.get_default_session()
# enc will store the actual encoded values of x
enc = session.run(t_enc, feed_dict={t_input:x})
# dec will store the actual decoded values of enc
dec = session.run(t_dec, feed_dict={t_enc:enc})

I hope you’ve enjoyed this post, and that it was useful or at least interesting. Initially I made it especially for the session part, which is not well documented. It took me hours trying to understand what a tensors were, what the session is, and how to actually compute only a part of the network, to evaluate and retrieve specific tensors, but all in all it actually helped consolidate the various tensorflow concepts which are far from intuitive, but very powerful once understood.

Thank you for reading, don’t hesitate to comment, and have a nice day!