Diving into TensorBoard

Original article was published by Keshav Aggarwal on Artificial Intelligence on Medium


TensorFlow is the most popular deep learning framework and with the release of Tensorflow 2.0, Keras has been integrated into the TensorFlow ecosystem. This has greatly boosted the ease of use for Tensorflow. Tensorflow provides many amazing functionalities and libraries for deep learning. Tensorboard is one of the amazing features provided by Tensorflow. As per the tensorflow.org:

Tensorboard is an amazing tool for analyzing, visualizing, debugging of training. Tensorboard is part of Tensorflow but can also be installed separately.

The outline of our journey with Tensorflow is as following:

  1. Getting Started with Tensorboard
  2. Analyzing the Scalars and Metrics
  3. Plotting Image Data
  4. Analyzing the Graph Model
  5. Hyperparameter Tuning

1. Getting Started with Tensorboard TensorBoad works using callbacks during the model training. To use Tensorboard in the training, you need to include the Tensorboard callback in the model.fit function.

log_dir = "logs"
tensorboard_callback = tf.keras.callbacks.Tensorboard(logdir)
model = tf.keras.Sequential([###YOUR MODEL###])
model.compile()
model.fit(x,y, callbacks=[tensorboard_callback])

After this, logs will be generated for the training in the log_dir. You can launch the Tensorbaord from Jupyter notebook or command line. To use it in the Jupyter notebook use the below code in a new cell and you will see the Tensorboard in the cell output.

%load_ext tensorboard
%tensorboard –logdir log_dir

To run Tensorboard using the command line using the below command and then open http://localhost:6006/ to view the Tensorboard.

tensorboard –logdir logs/

After running the above command, you will be able to see the Tensorboard as below:

You can see the accuracy and loss of both training and validation datasets. Next, let’s say you want to compare the performance of two models. Then you can do the logging for them in different directories. E.g. We will be training two models to do the cat vs dog classification, one will be a custom model while the other will be a transfer learning model (I know it’s an unfair comparison 😊).

As you can see it is easier with the help of Tensorboard to compare the performance of different models, which helps in quickly analyzing the experiment results. Like if we just want to see the training results we can select only the relevant runs from the left pane.

2. Analyzing the scalars and MetricsAs seen above Tensorboard plots the metrics by default. If you want to add any other metrics or parameters, then you can also do that. To log a custom value you need to use the summary writer to write the log. First, define a summary writer.

file_writer = tf.summary.create_file_writer(logdir + "/metrics")

After executing this there will be a new directory called metrics under the logdir directory. To record, the value call invokes the below code.

with file_writer.as_default():
tf.summary.scalar(name_of_the_scalar, data=value_to_log, step=epoch)

To log the value you need to have a callback that is triggered at the on_epoch_end otherwise only a single value will be logged. From there you need to do the logging using tf.summary.scalar. For this example, we will be using LearningRateScheduler. First, we need to define the function to change the learning rate based on epochs. This can be done using the below function:

def learning_rate_schedule(epoch):
learning_rate = 0.001

if epoch > 1:
learning_rate = 0.002
if epoch > 2:
learning_rate = 0.002+epoch*0.001

with file_writer.as_default():
tf.summary.scalar('Learning Rate', data=learning_rate, step=epoch)
return learning_rate

Then add this callback to the list of callbacks in the model.fit. After the model starts training you will be able to monitor one new graph under the Scalars tab.

If you want to monitor one metric then you can set the file writer as default by using the below syntax.

file_writer.set_as_default()

3. Plotting Image Data: While training deep learning models, it is good to monitor the training progress along with other metrics like AUC curve, confusion metrics, etc. These metrics can be plotted as images in Tensorboard and can be monitored across epochs. You can also use input or output images of model. To plot an image in the Tensorboard you need to call the tf.sumary.image function.

tf.summary.image("Image data", image, step=EPOCH)

In case the image is in a tensor format, you can pass the image as an argument and it will be plotted. In case the image is not in the tensor format, it will need a conversion to tensor format before plotting.For this example we will plot the output of the hidden convolution layers in Tensorboard images. To do this first we will create a grid of the output from the convolution layer. To do this we will use the below method:

def getFeatureMap():
earlyPredictor = tf.keras.Model(model.inputs,model.get_layer(index=6).output)
feature_maps = earlyPredictor.predict(img)[0]
square = 8
ix = 1
figure = plt.figure(figsize=(12,12))
for _ in range(square):
for _ in range(square):
ax = plt.subplot(square, square, ix)
ax.set_xticks([])
ax.set_yticks([])
plt.imshow(feature_maps[:, :, ix-1], cmap='gray')
ix += 1
return figure

def plot_to_image(figure):
buf = io.BytesIO()plt.savefig(buf, format='png')
plt.close(figure)buf.seek(0)
image = tf.image.decode_png(buf.getvalue(), channels=4)
image = tf.expand_dims(image, 0)
return image

As in the logging the scalar values in Tensorboard, to log the images we will need a callback that will record the images at each epoch (if you need epoch level). So to we will call this during the LearningRateScheduler callback. To do that we first need to get the figure using the getFeatureMap function. Then we convert the figure to a tensor using plot_to_image and write it to logs using tf.summary.image.

figure = getFeatureMap()
with file_writer_image.as_default():
tf.summary.image("Feature Map", plot_to_image(figure), step=epoch)

After the training is done, you will get the output for all the epochs. You can use the slider to see the output of different epochs.

4 Analyzing the graph Model: Tensorflow uses graph models to optimize the models. With the arrival of Tensorflow 2.0, we can use the Keras to define the layers and Tensorflow handles it under the hood. Tensorflow creates a graph for the defined model to optimize the training.

When you create a model the graph is generated and if you use Tensorboard callback you can see all the layers present in the model under the GRAPHS tab. If you want to focus on one layer, then you need to double click on that. Let’s analyze one convolution layer.

As you can see on double-clicking any convolution layer, there is Conv2D layer, followed by Bias and Relu. The arrows represent the flow and the number of arrows is the output/input dimension. On the top right, you can see the name of the layer as well as the input-output layer names with input/output dimensions. If you observe the Dropout layer you will find something fascinating. As you know Dropout layer randomly drops a specified number of neurons during training, however during the inference this layer doesn’t drop anything. How does the model handle that? This can be seen in the graph:

If you see there is one keras_lerning_phase that takes one scalar input and the output goes into every dropout layer. The input to the learning phase is a Boolean which indicates whether it is a learning phase or not. This way the dropout value can be changed from a specified value to 0.

5. Hyperparameter Tuning: Training deep learning networks experimentation. These experiments can be ranging from data augmentation to hyperparameter tuning. Tensorboard provides a way to track these hyperparameter experiments in the HParams dashboard. To use the HParams dashboard there are few changes which are needed to be done in the model definition. First, import the HParams API and define the hyperparameters you want to tune. To define the Hparams you use below syntax:

hp.HParam(name, domain=None, display_name=None, description=None)

Name: the name of the parameter
domain: the values, it can be discrete (hp.discrete), integer interval (hp.IntInterval(min_value=None, max_value=None)) or real interval (hp.RealInterval(min_value=None, max_value=None))
display_name : Name which is to be shown in the Tensorboard
description: Description of the parameter

Let’s define the few parameters which we want to tune.

import tensorboard.plugins.hparams.api as hp
HP_Filters_layer_1 = hp.HParam('num_filters_layer_1', hp.Discrete([16, 32]))
HP_Filters_layer_2 = hp.HParam('num_filters_layer_2', hp.Discrete([32, 64]))
HP_Kernel = hp.HParam('kernel_size', hp.Discrete([3, 5]))
HP_Optimizer = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

Then we need to define the file writer and define the configuration for the HParam board using the below code:

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_Filters_layer_1, HP_Filters_layer_2, HP_Kernel, HP_Optimizer],
metrics=[hp.Metric('accuracy', display_name='Accuracy')])

With this, we have written the high-level information for the HParam dashboard. Next is to define the model. To define the model, we create a function that accepts the Hyperparamters and the run name.

def configurableModel(dir, hparams, name):
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=imgsize+(3,)),
tf.keras.layers.Conv2D(hparams[HP_Filters_layer_1], (hparams[HP_Kernel],hparams[HP_Kernel]), activation='relu'),
tf.keras.layers.MaxPool2D(),
#Rest of the layers])

As you can see in the Conv2D layer instead of passing the number of filters we are passing the hparams[HP_Filters_layer_1]. This needs to be done on all the layers for the model for which we have a configuration. The same needs to be done for the model.compile as we have optimizer as well in the hyperparameter.

model.compile(optimizer=hparams[HP_Optimizer], loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Now you need to pass the callback in the model.fit as below:

model.fit(train, batch_size=64, epochs=10, callbacks=[hp.KerasCallback(dir, hparams, trial_id=name)])

dir: is the directory name.
hparams: dict of hyperparameters.
trial_id: name for the run with these Hparams.

Now we need to run the model with all the possible combinations of the parameters. To do this we just need to call the model function in a nested loop of the hyperparameters:

!rm -rf logs/ #To remove previous logs
iter = 0
for layer1 in HP_Filters_layer_1.domain.values:
for layer2 in HP_Filters_layer_2.domain.values:
for kernel in HP_Kernel.domain.values:
for optimizer in HP_Optimizer.domain.values:
iter += 1
hparams = {
HP_Filters_layer_1: layer1,
HP_Filters_layer_2: layer2,
HP_Kernel:kernel,
HP_Optimizer: optimizer
}
run_name = "run_"+iter
acc = configurableModel('logs/hparam_tuning/' + run_name, hparams, run_name)

Once the training is finished we will be able to see the HPARAMS tab in the tensorboard and a table with the results.

One important thing to consider is that the training needs to be run n number of times (depending on the number of configurations. 222*2 in our case), which may lead to a tremendous amount of computing time. So it may be a good idea to run the experiment on a small dataset for small epochs to find good parameters or at least filter the worst ones.

As we explored the Tensorboard, we found that Tensorboard is of great help in monitoring the training in many ways. Tensorboard can also be used with PyTorch.

Code can be found on Github.

Happy Deep Learning…