Autoencoder neural networks: what and how?

Source: Deep Learning on Medium

Parts list

Here’s the basic list of things we’ll need to create.

  1. input data— what is getting encoded and decoded?
  2. an encoding function — there needs to be a network that takes an input and encodes it.
  3. a decoding function — there needs to be a network that takes the encoded input and decodes it.
  4. loss function — The autoencoder is good when the output of the decoded version is very close to the original input data (loss is small), and bad when the decoded version looks nothing like the original input.

The Approach

The simplest autoencoder looks something like this: x → h → r, where the function f(x) results in h, and the function g(h) results in r. We’ll be using neural networks so we don’t need to calculate the actual functions.

Logically, step 1 will be to get some data. We’ll grab MNIST from the Keras dataset library. It’s comprised of 60,000 training examples and 10,000 test examples of digits 0–9. Next, we’ll do some basic data preparation so that we can feed it into our neural network as our input set, x.

Then in step 2, we’ll build the basic neural network model that gives us hidden layer h from x.

  1. We’ll put together a single dense hidden layer that takes in x as input with a ReLU activation layer.
  2. Next, we’ll pass the output of this layer into another dense layer, and run the output through a sigmoid activation layer.

Once we have a model, we’ll be able to train it in step 3, and then in step 4, we’ll visualize the output.

Let’s put it together:

First, let’s not forget the necessary imports to help us create our neural network (keras), do standard matrix mathematics (numpy), and plot our data (matplotlib). We’ll call this step 0.

# Importing modules to create our layers and model.
from keras.layers import Input, Dense
from keras.models import Model
# Importing standard utils
import numpy as np
import matplotlib.pyplot as plt

Step 1. Import our data, and do some basic data preparation. Since we’re not going to use labels here, we only care about the x values.

from keras.datasets import mnist(train_xs, _), (test_xs, _) = mnist.load_data()

Next, we’ll normalize them between 0 and 1. Since they’re greyscale images, with values between 0 and 255, we’ll represent the input as float32’s and divide by 255. This means if the value is 255, it’ll be normalized to 255.0/255.0 or 1.0, and so on and so forth.

# Note the '.' after the 255, this is correct for the type we're dealing with. It means do not interpret 255 as an integer. 
train_xs = train_xs.astype('float32') / 255.
test_xs = test_xs.astype('float32') / 255.

Now think about this, we have images that are 28 x 28, with values between 0 and 1, and we want to pass them into a neural network layer as an input vector. What should we do? We could use a convolutional neural network, but in this simple case, we’ll just use a dense layer. So how do we feed it in? We’ll flatten it into a single dimensional vector of 784 x 1 values (28 x 28).

train_xs = train_xs.reshape(len(train_xs), np.prod(np.prod(train_xs.shape[1:])))test_xs = test_xs.reshape(len(test_xs), np.prod(np.prod(test_xs.shape[1:])))

Step 2. Let’s put together a basic network. We’re simply going to create an encoding network, and a decoding network. We’ll put them together into a model called the autoencoder below. We’ll also decrease the size of the encoding so we can get some of that data compression. Here we’ll use 32 to keep it simple.

# Defining the level of compression of the hidden layer. Basically, as the input is passed through the encoding layer, it will come out smaller if you want it to find salient features. If I choose 784, there would be a compression factor of 1, or nothing.
encoding_dim = 32
input_img = Input(shape=(784, ))
# This is the size of the output. We want to generate 28 x 28 pictures in the end, so this is the size we're looking for.
output_dim = 784
encoded = Dense(encoding_dim, activation='relu')(input_img)decoded = decoded = Dense(output_dim, activation='sigmoid')(encoded)

Now create a model that accepts input_img as inputs and outputs the decoder layer. Then compile the model, in this case with adadelta as the optimizer and binary_crossentropy as the loss.

autoencoder = Model(input_img, decoded)autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

Step 3. Our model is ready to train. You’ll be able to run this without a GPU, it doesn’t take long. We’ll call fit on the autoencoder model we created, passing in the x values for both the inputs and outputs, for 50 epochs, with a relatively large batch size (256). This will help it train somewhat quickly. We’ll enable shuffle to prevent homogeneous data in each batch and then we’ll use the test values as validation data.

autoencoder.fit(train_xs, train_xs, epochs=50, batch_size=256, shuffle=True, validation_data=(test_xs, test_xs)

That’s it. Autoencoder done. You’ll see it should have a loss of about 0.69 meaning that the reconstruction we’ve created generally represents the input fairly well. But can’t we take a look at it for ourselves?

Step 4. For this, we’ll do some inference to grab our reconstructions from our input data, and then we’ll display them with matplotlib. For this we want to use the predict method.

Here’s the thought process: take our test inputs, run them through autoencoder.predict, then show the originals and the reconstructions.

# Run your predictions and store them in a decoded_images list. 
decoded_images = autoencoder.predict(test_xs)
The top row is the inputs, and the bottom row is the reconstruction from our autoencoder model.

Here’s how you get that image above:

# We'll plot 10 images. 
n = 10
plt.figure(figsize=(16, 3))
for i in range(n):
# Show the originals
ax = plt.subplot(2, n, i + 1)
plt.imshow(test_xs[i].reshape(28, 28))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Show the reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()