Binary Classification of IMDB Movie Reviews

Original article was published on Artificial Intelligence on Medium


Building the Neural Network

Our input data is vectors that need to be mapped to scaler labels (0s and 1s). This is one of the easiest setups, and a simple stack of fully-connected, Dense layers with relu activation perform quite well.

Hidden layers

In this network, we will leverage hidden layers. We will define our layers as such.

Dense(16, activation='relu')

The argument being passed to each Dense layer, (16) is the number of hidden units of a layer.

The output from a Dense layer with relu activation is generated after a chain of tensor operations. This chain of operations is implemented as

output = relu(dot(W, input) + b)

Where, W is the Weight matrix and b is the bias (tensor).

Having 16 hidden units means that the matrix W will be of the shape (input_Dimension, 16 ). In this case, where the dimension of the input vector is 10,000, the shape of the Weight matrix will be (10000, 16). If you were to represent this network as a graph, you would see 16 neurons in this hidden layer.

To put it in layman’s terms, there will be 16 balls in this layer.

Each of these balls or hidden units is a dimension in the representation space of the layer. Representation space is the set of all viable representations for the data. Every hidden layer composed of its hidden units aims to learn one specific transformation of the data or one feature/pattern from the data.

DeepAI.org has a very informative write-up on hidden layers.

Hidden layers, simply put, are layers of mathematical functions each designed to produce an output specific to an intended result.

Hidden layers allow for the function of a neural network to be broken down into specific transformations of the data. Each hidden layer function is specialized to produce a defined output.For example, a hidden layer functions that are used to identify human eyes and ears may be used in conjunction by subsequent layers to identify faces in images. While the functions to identify eyes alone are not enough to independently recognize objects, they can function jointly within a neural network.

ReLU activation function. This is one of the most commonly used activation functions.

Model Architecture

For our model, we will use

  1. Two intermediate layers with 16 hidden units each
  2. Third layer that will output the scalar sentiment prediction
  3. Intermediate layers will use the relu activation function. relu or Rectified linear unit function will zero out the negative values.
  4. Sigmoid activation for the final layer or output layer. A sigmoid function “squashes” arbitrary values into the [0,1] range.
The Sigmoid Activation Function. (Source: Wikipedia, By Qef )

There are formal principles that guide our approach in selecting the architectural attributes of a model. These are not covered in this case study.

Defining the model architecture

Compiling the model

In this step, we will choose an optimizer, a loss function, and metrics to observe. We will go forward with

  • binary_crossentropy loss function, commonly used for Binary Classification
  • rmsprop optimizer and
  • accuracy as a measure of performance

We can pass our choices for optimizer, loss function and metrics as strings to the compile function because rmsprop, binary_crossentropy and accuracy come packaged with Keras.

model.complie(optimizer='rmsprop',
loss = 'binary_crossentropy',
metrics = ['accuracy'])

One could use a customized loss function or optimizer by passing a custom class instance as an argument to the loss, optimizer or mertics fields.

In this example, we will implement our default choices, but we will do so by passing class instances. This is precisely how we would do it if we had customized parameters.

Compiling the model