TensorFlow, Sequential and Functional Models

Source: Deep Learning on Medium

TensorFlow, Sequential and Functional Models

In this blog, we will compare the performance between TensorFlow Keras sequential models and functional models and also between different versions of TensorFlow.

First we will use the MNIST dataset to train our model. We will work with the Sequential API and compare between TensorFlow version 1.15.0 and 2.0.0. For both models, we will be training on 50 epochs. The setup code is as follows:

try:

%tensorflow_version 1.x

except Exception:

pass

import tensorflow as tf

print(tf.__version__)

assert tf.__version__.startswith(‘1’)

(%tensorflow_version 1.x and assert tf.__version__.startswith(‘1’) will change to %tensorflow_version 2.x and assert tf.__version__.startswith(‘2’) for TensorFlow version 2.0.0)

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras import Model

import matplotlib.pyplot as plt

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([

tf.keras.layers.Flatten(input_shape=(28, 28)),

tf.keras.layers.Dense(10, activation=’softmax’)

])

model.compile(optimizer=’adam’,

loss=’sparse_categorical_crossentropy’,

metrics=[‘accuracy’])

model.fit(x_train, y_train, epochs=50)

model.evaluate(x_test, y_test)

For TensorFlow version 1.15.0, we get an average throughput speed of 5s 80us/sample and an bestaccuracy of 93.27%. For TensorFlow version 2.0.0, we get an average throughput speed of 4s 65us/sample and an best accuracy of 93.40%. When training the Sequential Model, the results indicate that the more recent version of TensorFlow has an improvement in speed but still gives the same accuracy of approximately 93% with the MNIST dataset. We will use this result to compare with the Functional Model that we will define.

Now, we will train the Functional Model that we have defined. We will also define our softmax layer for the model. We will use the same MNIST dataset and see the training time and accuracy. Unfortunately, for the functional model that we defined, we would not be able to run with TensorFlow version 1.15.0. The setup code is as follows:

# Download a dataset

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Batch and shuffle the data

train_ds = tf.data.Dataset.from_tensor_slices(

(x_train.astype(‘float32’) / 255, y_train)).shuffle(1024).batch(32)

test_ds = tf.data.Dataset.from_tensor_slices(

(x_test.astype(‘float32’) / 255, y_test)).batch(32)

loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

optimizer = tf.keras.optimizers.Adam()

train_loss = tf.keras.metrics.Mean(name=’train_loss’)

train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name=’train_accuracy’)

test_loss = tf.keras.metrics.Mean(name=’test_loss’)

test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name=’test_accuracy’)

def my_softmax(x):

# x is of size [batch_size, class]

batch_size = len(x) # 32

number_of_class = len(x[0]) # 10

# Get the max logit for each row

max_tensors = tf.reduce_max(x, axis=1) # shape (32,)

# Stack the max logit value 10 times along the column

max_tensors = tf.stack([max_tensors]*number_of_class, axis=1) # shape (32, 10)

# Substract each logit with it’s corresponding max logit

substracted_tensors = x — max_tensors # shape (32, 10)

# For each substracted logit, apply logit = e^logit

exp_of_substracted_tensors = tf.exp(substracted_tensors) # shape (32, 10)

# For each row, sum e^logit

sum_of_exp_of_substracted_tensors = tf.reduce_sum(exp_of_substracted_tensors, axis=1) # shape (32,)

# Stack the the sum 10 times along the column

sum_of_exp_of_substracted_tensors = tf.stack([sum_of_exp_of_substracted_tensors]*number_of_class, axis=1) # shape (32, 10)

# Divide each e^logit with the sum over its corresponding row

return tf.divide(exp_of_substracted_tensors, sum_of_exp_of_substracted_tensors) # shape (32, 10)

class MyModel(Model):

def __init__(self):

super(MyModel, self).__init__()

self.flatten = Flatten()

self.d1 = Dense(10)

def call(self, x):

x = self.flatten(x)

x = self.d1(x)

return my_softmax(x)

subclass_linear_model_softmax_from_scratch = MyModel()

@tf.function

def train_step_2(images, labels):

with tf.GradientTape() as tape:

predictions = subclass_linear_model_softmax_from_scratch(images)

loss = loss_object(labels, predictions)

gradients = tape.gradient(loss, subclass_linear_model_softmax_from_scratch.trainable_variables)

optimizer.apply_gradients(zip(gradients, subclass_linear_model_softmax_from_scratch.trainable_variables))

train_loss(loss)

train_accuracy(labels, predictions)

EPOCHS = 50

for epoch in range(EPOCHS):

t0 = time.time()

for images, labels in train_ds:

train_step_2(images, labels)

template = ‘Epoch {}, Training Time: {}, Loss: {}, Accuracy: {}’

print(template.format(epoch+1,

time.time() — t0,

train_loss.result(),

train_accuracy.result()*100

)

# Reset the metrics for the next epoch

train_loss.reset_states()

train_accuracy.reset_states()

For TensorFlow version 2.0.0, we get an average throughput speed of 3s 87us/sample and an best accuracy of 93.92%. Compared to the results for the Sequential Model with throughput speed of 4s 65us/sample, the speed is approximately 1s faster. There is not a huge difference in accuracy, but there is a slight improvement with the functional model.

From this experiment, we can conclude as the following:

  1. The more recent TensorFlow version, training becomes faster, but there is no improvement in accuracy.
  2. Compared to the Sequential Model, the Functional Model has a significant improvement in training time and a slight improvement in accuracy.