A beginner’s guide to build stacked autoencoder and tying-weights with it.

Source: Deep Learning on Medium

A beginner’s guide to build stacked autoencoder and tying-weights with it.

An autoencoder is an artificial neural network that aims to learn a representation of a data-set. It uses the method of compressing the input into a latent-space representation and reconstruct the output from this . Autoencoders are used for dimensionality reduction, feature detection and is also capable of randomly generating new data with the extracted features.

Autoencoders are having two main components. The Encoder: It learns how to reduce the dimensions of the input data and compress it into the latent-space representation. The Decoder: It learns how to decompress the data again from the latent-space representation to the output, sometimes close to the input but lossy. The Latent-space representation layer also known as the bottle neck layer contains the important features of the data.

Stacked Autoencoder

In an autoencoder structure, encoder and decoder are not limited to single layers and it can be implemented with stack of layers, hence it is called as Stacked autoencoder. Thus stacked autoencoders are nothing but Deep autoencoders having multiple hidden layers. With more hidden layers, the autoencoders can learns more complex coding. However, we need to take care of these complexity of the autoencoder so that it should not tend towards over-fitting.

In the architecture of the stacked autoencoder, the layers are typically symmetrical with regards to the central hidden layer.

Implementation Of Stacked Autoencoder: Here we are going to use the MNIST data set having 784 inputs and the encoder is having a hidden layer of 392 neurons, followed by a central hidden layer of 196 neurons. As the model is symmetrical, the decoder is also having a hidden layer of 392 neurons followed by an output layer with 784 neurons.

Before going through the code, we can discuss the libraries that we are going to use in this example. Here we are using the Tensorflow 2.0.0 including keras . Also using numpy and matplotlib libraries.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist

Next we are using the MNIST handwritten data set, each image of size 28 X 28 pixels. We are loading them directly from Keras API and displaying few images for visualization purpose .

#Loading the MNIST data
(X_train_orig, _), (X_test, _) = mnist.load_data()
all_data = np.concatenate((X_train_orig, X_test))
all_data.shape

Before going further we need to prepare the data for our models. For that we have to normalize them by dividing the RGB code to 255 and then splitting the total data for training and validation purpose. With the help of the show_reconstructions function we are going to display the original image and their respective reconstruction and this function is going to be used after the model is trained, to rebuild the output.

# Normalizing the RGB codes by dividing it to the max RGB value.
max_value = float(X_train_orig.max())
X_Train = X_train_orig.astype(np.float32) / max_value
X_Test = X_test.astype(np.float32) / max_value
#Train and validation split
X_train, X_valid = X_Train[:-7000], X_Train[-7000:]
X_train.shape, X_valid.shape
def plot_image(image):
plt.imshow(image, cmap="binary")
plt.axis("off")
#Displays the original images and their reconstructions
def show_reconstructions(model, images=X_valid, n_images=10):
reconstructions = model.predict(images[:n_images])
fig = plt.figure(figsize=(n_images * 1.5, 3))
for image_index in range(n_images):
plt.subplot(2, n_images, 1 + image_index)
plot_image(images[image_index])
plt.subplot(2, n_images, 1 + n_images + image_index)
plot_image(reconstructions[image_index])

Here we are building the model for stacked autoencoder by using functional model from keras with the structure mentioned before (784 unit-input layer, 392 unit-hidden layer, 196 unit-central hidden layer, 392 unit-hidden layer and 784 unit-output layer). We are creating an encoder having one dense layer of 392 neurons and as input to this layer, we need to flatten the input 2D image. Then the central hidden layer consists of 196 neurons which is less than 784 of input to retain only important features. The decoder is symmetrical to the encoder and is having a dense layer of 392 neurons and then the output layer is again reshaped to 28 X 28 to match with the input image.

#Stacked Autoencoder with functional model
#encoder
inputs = keras.Input(shape=(28,28))
lr_flatten = keras.layers.Flatten()(inputs)
lr1 = keras.layers.Dense(392, activation="selu")(lr_flatten)
lr2 = keras.layers.Dense(196, activation="selu")(lr1)
#decoder
lr3 = keras.layers.Dense(392, activation="selu")(lr2)
lr4 = keras.layers.Dense(28 * 28, activation="sigmoid")(lr3)
outputs = keras.layers.Reshape([28, 28])(lr4)
stacked_ae = keras.models.Model(inputs,outputs)
stacked_ae.compile(loss="binary_crossentropy",optimizer=keras.optimizers.SGD(lr=1.5))
stacked_ae.summary()

After creating the model we have to fit the model with the training and validating dataset and reconstruct the output.

h_stack = stacked_ae.fit(X_train, X_train, epochs=20,validation_data=[X_valid, X_valid])show_reconstructions(stacked_ae)

Tying weights

To understand the concept of tying weights we need to find the answers of three questions about it. what , why and when. Lets start with when to use it? So when the autoencoder is typically symmetrical, it is a common practice to use tying weights . Now what is it? This is nothing but tying the weights of the decoder layer to the weights of the encoder layer. Next is why we need it? This reduces the number of weights of the model almost to half of the original, thus reducing the risk of over-fitting and speeding up the training process.

Implementation of Tying Weights: To implement tying weights, we need to create a custom layer to tie weights between the layer using keras. This custom layer acts as a regular dense layer, but it uses the transposed weights of the encoder’s dense layer, however having its own bias vector.

class DenseTranspose(keras.layers.Layer):
def __init__(self, dense, activation=None, **kwargs):
self.dense = dense
self.activation = keras.activations.get(activation)
super().__init__(**kwargs)
def build(self, batch_input_shape):
self.biases = self.add_weight(name="bias", initializer="zeros",shape=[self.dense.input_shape[-1]])
super().build(batch_input_shape)
def call(self, inputs):
z = tf.matmul(inputs, self.dense.weights[0], transpose_b=True)
return self.activation(z + self.biases)

The structure of the model is very much similar to the above stacked autoencoder , the only variation in this model is that the decoder’s dense layers are tied to the encoder’s dense layers and this is achieved by passing the dense layer of the encoder as an argument to the DenseTranspose class which is defined before.

dense_1 = keras.layers.Dense(392, activation="selu")
dense_2 = keras.layers.Dense(196, activation="selu")
#tied_encoder
inputs = keras.Input(shape=(28,28))
l_flatten = keras.layers.Flatten()(inputs)
l_en1 = dense_1(l_flatten)
l_en2 = dense_2(l_en1)
#tied_decoder
l_dc1 = DenseTranspose(dense_2, activation="selu")(l_en2)
l_dc2 = DenseTranspose(dense_1, activation="sigmoid")(l_dc1)
outputs = keras.layers.Reshape([28, 28])(l_dc2)
tied_ae = keras.models.Model(inputs, outputs)
tied_ae.compile(loss="binary_crossentropy",optimizer=keras.optimizers.SGD(lr=1.5))
tied_ae.summary()

Now we have to fit the model and reconstruct the output to verify with the input images.

#fit the model 
h_tied = tied_ae.fit(X_train, X_train, epochs=20,validation_data=[X_valid, X_valid])
# reconstruct the image
show_reconstructions(tied_ae)

Conclusion

From the summery of the above two models we can observe that the parameters in the Tied-weights model (385,924) reduces to almost half of the Stacked autoencoder model(770,084). with this reduction of the parameters we can reduce the risk of over fitting and improve the training performance. Also we can observe that the output images are very much similar to the input image which implies that the latent representation retained most of the information of the input images.

Thanks for reading.

Reference

  1. https://blog.keras.io/building-autoencoders-in-keras.html
  2. https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ch17.html