Mastering Linear Algebra in TensorFlow — Part1: Input Layer

Source: Deep Learning on Medium

Mastering Linear Algebra in TensorFlow — Part1: Tabular Data

One of the time-consuming issues for beginners is the matrix operations under different deep learning architectures. In other words, processing the linear algebra stuff represents how well you understand the mechanism of the matrix calculation. If you have little concept about matrix or tensor in linear algebra, checkout on Vihar Kurama’s post on Medium! To minimize confusion, TensorFlow 1.X and TensorFlow 2.X will be both included in this tutorial. All codes are executable on Google Colab for you can get into the topic in the shortest time.

Major data formats

Before diving into detail In general, there are four major types of the input data format:

1. Vector Data

(1) Data shape: (data samples * features)

2. Timeseries Data or Sequence Data

(1) Data shape: (data samples * timestamps * features)

For sequence data, timestamps stand for sequence length of your input data, and the length should be fixed before feeding into the model. Usual methods include zero paddings and slicing. As for time series data, it represents each time period.

3. Image Data

(1) Data shape: (data samples * height * width * channels)

Height and width refer to the shape of the image. Channel represents things like RGB(then the channel is 3), greyscale(then the channel is 1), etc.

(2) Data shape: (data samples * features) if flattens height * width * channels

4. Video Data

(1) Data shape: (data samples * frame * height * width * channels)

(2) Data shape: (data samples * features) if flattens frame * height * width * channels

Vector Data

Today we will focus on vector data and linear algebra handling when building this type of model. Note that data preprocessing may be ignored due to the things we are focusing on is not there this time. Vector data, also known as 2D tensors. If we say a single data is a vector, then 2D vector data is just when we handle more than one single data at a time. 2D vector data is the most common task in machine learning, and we sometimes transform high dimensional matrix such as image data into 2D vectors as well.

Vector Data — Classification Problems

1. Data Preparation

Suppose we are implementing a 3-layers multilayer perceptron for the Iris dataset for classification on TensorFlow 1.X. First, we import some necessary packages:

On TensorFlow 1.X:

On TensorFlow 2.X:

2. Model Architecture Preparation

Then, we assign the dimension of the data in each layer:

3. Model Building(input and hidden layers)

For convenience, we set the output shape of the hidden layers in the same dimension, which is 4. n_class refers to the number of classes of the target feature. Then, let’s handle first the tricky parts: inputs, and labels.

On TensorFlow 1.X:

By TensorFlow’s design, we prefer feature_size to be the second dimension and None for any number of samples (batch sizes) on the first. For the weights(w1), we use [feature_size, hidden_1]. As for bias, the shape is [hidden_1, ]. Thus, layer1 we do tf.add(tf.matmul(x, w1), b1). But, why not W*x + b? Why x*W make better sense? Let’s look at the figure below:

TensorFlow expects to be of the shape (number of examples, num_classes). Hence, doing tf.matmul(x, W) is more intuitive for fitting TensorFlow’s API format. Still, you may see some input data or tutorial such as Deep Learning.ai from Andrew Ng use tf.matmul(W, x). However, a transpose operation after the dot product is needed for this kind of format (tf.transpose(tf.matmul(W, x))):

According to this reason, it becomes easier to know why the following layers look like this:

On TensorFlow 2.X:

TensorFlow 2.X is much easier, you just need to be careful about the input layer, which should be (feature_size, ). As for the rest, you only need to specify the output shape and TensorFlow 2.X is going to complete the rest for you. Note, the format of the code is in a functional API format which keeps more flexibility. But you can still use the common one since I provide both.

4. Model Building(optimizer and prediction)

Prediction is also a difficult part for beginners especially when they see syntaxes liketf.argmax and tf.reduce_mean everywhere.

On TensorFlow 1.X:

The tf.argmax is a function that returns the index of the largest number of the array. The argument axis specify which dimension you are comparing. For example, axis = 1 compares the second dimension, which is the feature column in the Iris dataset. To illustrate clearly, we look at the figure:

After tf.equal(tf.argmax(targets, 1), tf.argmax(output_logits, 1)) , We will receive a list of the Boolean values indicates where the indices match the same value. That is to say, the correct prediction. Then tf.reduce_mean is a function that averages of the results, the total accuracy!

On TensorFlow 2.X:

TensorFlow 2.X has a very similar syntax here:

That’s all for the tutorial for tabular data. Next, we will start to handle linear algebras on sequential data and time-series data!