A comprehensive guide for profiling a deep learning model

Source: Deep Learning on Medium

Our traditional approach for a deep learning model was to improve accuracy of a model. There were lots of innovation out there building better deep learning models to improve accuracy with many creative algorithms. But it was only focused to get a better accuracy until 2017 when DAWN Bench competition was released by Stanford University at NIPS conference. The main focus of this competition was not to solely improve model accuracy but to improve computation time and hardware cost of deep learning models as these are the most critical resources in building deep learning models.

Now, keeping the above in mind if we try to improve our models computation time there are many techniques available. One of such great technique’s explained here. But, before trying to improve our training time we need to know where our network spent most of the time during training. So, that we can focus on that part to try to reduce it. For this we need to profile our model architecture to check average time taken by each layer and how CPU and GPU was used during training.

Here, in this post we are going to discuss profiling in Tensor Board using python to achieve the above goal.

Also, in this post we will learn how to use proper data pipeline to use CPU and GPU optimally and minimize per epoch time significantly and visualize it in tensor board.

Tensor Board

Tensor Board is the interface used to visualize the graph and other tools to understand, debug, and optimize the model. To understand how it works let’s deep dive into it.

Now, before going in depth of tensor board let’s create our own model architecture on top of which we are going to visualize its performance with the help of tensor board.

So, let’s create a CNN based model architecture with the help of tensor flow:

Let’s import all important libraries:

Now, let’s create Dense Net architecture for CIFAR10 dataset:

Since our model architecture is ready let’s preprocess our CIFAR10 dataset and compile our model:

Next we are going to create a directory for tensor board callback and we will define our Tensor Board callback. By default, Tensor Flow will profile the second batch. We can modify it by setting profile_batch

Now let’s run the model for 5 epochs and we will view the result of logs in Tensor Board GUI.

Since training is complete now let’s visualize the result:

To visualize logs in tensor board there are two ways:

  • We can visualize the result directly in browser. In this process no local dependency is needed. We can directly run the below code snippet and visualize Tensor Board in the link generated by it.
  • We can also do the same in our local. First we should zip and download the logs file using this code !zip -r /content/logs.zip /content/logs. Once it is downloaded unzip it. Now open command prompt and move to directory of logs using cd logs_directory. Now if we type this command tensorboard --logdir=logs we can visualize tensor board in http://localhost:6006 address.

Once tensor board GUI is opened we can visualize as below:

Now if we click on profile tab we may see profile information as below:

So, in profile tab we see all trace info generated by our model. Basically these events are collected from 3 different sources as:

  • CPU: CPU events are under event group named /host:CPU. Each track represents a thread on CPU. E.g. input pipeline events, GPU op scheduling events, CPU ops execution events, etc.
  • GPU: GPU events are under event groups prefixed by /device:GPU:. Except stream:all, each event group represents one stream on GPU. stream::all aggregates all events on one GPU. E.g. Memory copy events, Kernel execution events, etc.
  • Tensor Flow Run time: Run time events are under event groups prefixed by /job:. Run time events represent the Tensor Flow ops invoked by python program. E.g. tf.function execution events, etc.

In this section we have discussed profiling only using tensor board. There are other ways of profiling as well. We can manually profile and write our tensor flow code. Also, we can use CProfile for profiling.


  1. https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

2. The School Of AI tutorial by Rohan Sravan.