Source: Deep Learning on Medium
Computer vision is an interesting topic lately due to autonomous cars, augmented reality, ANPR cameras, etc. Cloud computing has long been the way to go due to computational restrictions on edge devices, but the tables have turned. NVIDIA has done a great job with their Jetson devices, starting with the TX2 and now with the more compact one, the Nano. With this device, it is possible to bring more computation to the edge devices and lose the tight restrictions you have with doing everything in the cloud.
We are going to do computer vision on the Jetson Nano with a standalone Docker image which can be deployed using Balena if necessary. This features a simple object detection with an SSD MobileNet v2 COCO model optimized with TensorRT for the NVIDIA Jetson Nano built upon Jetson Inference of dusty-nv. The GitHub repository to back everything up that is referenced in this post can be found below.
The TensorFlow model zoo can help get you started with already pre-trained models. Note that those models will not directly work TensorRT, they first need to be converted to another format to be able to be executed with TensorRT. The repository of dusty-nv has optimized models that have been built to be compatible with TensorRT when running within his repository. It is possible to convert TensorFlow or Caffe models which is further explained later, as well as why exactly TensorRT is needed and what it is.
NVIDIA Jetson Nano
Let’s go a little deeper into the Jetson Nano and the dependencies it needs.
The Jetson Nano is the smallest device of the Jetson series. These series enable you to do machine learning on edge devices. It packs 472 GFLOPs for running AI algorithms. It can also run different neural networks in parallel. It only needs 5 to 10 watts and measures in at just 70 x 45 mm.
The NVIDIA Jetpack 4.2.2 can be flashed onto an sd-card and used on the Jetson Nano. This includes all dependencies and libraries you need to run your machine learning models and do inference on the Jetson Nano. First a deeper look into all the different components, technologies and dependencies and what they mean exactly.
Linux for Tegra (Linux4Tegra, L4T) is a GNU/Linux based system software distribution by Nvidia for the Nvidia Tegra processor series such as used in the Nvidia Jetson board series. (source)
L4T stands for ‘Linux for Tegra’, it’s a Linux based system software distribution by Nvidia for the Nvidia Tegra processor series. Those are used in the Jetson Nano. It’s already a part of NVIDIA Jetpack and can thus also be built independently into a Docker image which will be explained later.
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). (source)
As stated above and explained in full detail on the NVIDIA pages, CUDA is needed to run your inference and/or application code onto your GPU, which of course is needed with a Jetson Nano. It will allow you to take advantage of certain libraries such as their BLAS library, to get the most optimal GPU performance. If you want to know more, they have a YouTube video.
NVIDIA Container Runtime
The NVIDIA Container Runtime introduced here is our next-generation GPU-aware container runtime. (source)
The NVIDIA Container Runtime can be used with Docker and enables the usage of CUDA on your device. With their own words, it is a next-generation GPU-aware container runtime that enables portability in Docker images that leverage NVIDIA GPUs. It can be used by running ‘nvidia-docker’ instead of the standard ‘docker’ command or explicitly specify the runtime as depicted below. As it will become clearer later on, this is not needed if you pack everything needed into your Dockerfile. This enables you to run your Docker images with full awareness of your GPU and with CUDA on your NVIDIA devices and/or GPUs.
docker run --runtime=nvidia <image_name>
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. (source)
CuDNN dependency will accelerate widely used deep learning frameworks. This will help us to focus on building neural networks instead of getting them to run on the specific underlying hardware.
NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. (source)
TensorRT is highly recommended. This platform prunes the machine learning tree for your models and optimizes certain steps. Note that not all operations can be transformed into TRT compatible ones. It all depends on the TRT parser NVIDIA implemented. As you will read later on, they have several parsers for several models and are not actively updating all those parsers.
NVIDIA NvInfer is part of TensorRT and is used to do inference of neural network models on CUDA. The dependencies are needed to install the C and python3 bindings so it is possible to work with them.
OpenCV (Open source computer vision) is a library of programming functions mainly aimed at real-time computer vision. (source)
OpenCV is well known in the computer vision world and has a lot of useful features that can be used. Although it is probably not necessary to include this, depending on your exact use case, it’s not a bad idea to have this already baked into your Docker image. Chances are you will need it at some point.
TensorRT Optimized Models
Let us first look at compatible file extensions and the differences between a TRT engine and a TRT compatible model. The development on TensorRT is still heavy ongoing, so expect occasional roadblocks and unresolved issues on the NVIDIA forum. There is not a lot of documentation and every bit of information is scattered, so it’s not always clear what the way to go is yet.
compatible model types
There are several types of models that can be used to build TRT engines. The most popular are .uff, .onnx and .caffemodel. Note that not all built models are directly compatible with TRT. It can happen that you’ve successfully built a uff or onnx model but that it does not run with TRT. There is a wide variety of models that are TRT compatible made by dusty-nv and can be found on his GitHub page. You can always try to convert your TensorFlow models to onnx by using this converter or to uff by using this other converter. Frozen inference graphs (.pb) of TensorFlow can be converted to, for example, uff or onnx format as explained earlier. These formats are compatible with TRT, if built correctly.
engine vs model file
If you provide a model file it will first need to be built into a TRT engine, which will take some time. If you already have the binary engine file, it does not need to be built first.
If you can, make sure you convert your model into a TRT compatible format and make sure it can be run solely with TRT. This will give you the best performance, no dependency on TensorFlow and it’s the most portable solution. Testing your TRT compatible models and your TRT engines can be done by executing following command on a Jetson Nano with Jetpack installed or within a Docker container with L4T and all the other dependencies as explained later.
$ /usr/src/tensorrt/bin/trtexecMandatory params:
--deploy=<file> Caffe deploy file
OR --uff=<file> UFF file
OR --onnx=<file> ONNX Model file
OR --loadEngine=<file> Load a saved engineMandatory params for UFF:
--uffInput=<name>,C,H,W Input blob name and its dimensions for UFF parser (can be specified multiple times)
--output=<name> Output blob name (can be specified multiple times)Mandatory params for Caffe:
--output=<name> Output blob name (can be specified multiple times)
As you can see, uff models still need to have the input name and output name because they’re not included in the model itself. This is not needed for example with onnx models, because everything is already baked into that model. It is possible to get to know the exact input and output layer names by installing summarize_graph from the TensorFlow tools and running this from inside the TensorFlow folder.
curl -L -O https://github.com/tensorflow/tensorflow/archive/v1.8.0.tar.gz
bazel build tensorflow/tools/graph_transforms:summarize_graph
TensorFlow model => uff model & TRT engine
Tensorflow models can be converted to a compatible uff model and a compatible engine by using this converter in a Google Colab. It explains the process of transforming TensorFlow models to TensorRT. Note that for SSD MobileNet, the libflattenconcat.so plugin has to be included to be able to convert it into a working TRT compatible engine and uff model. This is because the frozen model uses some TensorFlow operations that are not natively supported in TensorRT, like FlattenConcat. You can also find extra help from the NVIDIA forum here, about getting an SSD MobileNet TensorFlow model to a uff model and TRT engine. It is also possible to build the libflattenconcat.so file from files found here. Note that this is already included in the dusty-nv inference repository.
The NMS operation is not correctly mapped to the NMS_TRT operation, so a little preprocessing has to be done to map those operations. The FlattenConcat operation is not yet incorporated into TensorRT 5 (but will in a future version of TensorRT 6), so it is still needed to make it available as a plugin during the creation of the uff model and engine and during the TensorRT execution.
I’ve also read that the uff parser in TensorRT is not actively updated, so it’s best to look into the onnx models.
TensorFlow model => onnx model & TRT engine
Onnx models can be obtained from Tensorflow models with this converter. I did not have a lot of success with the onnx. It successfully builds an onnx model from an SSD MobileNet V2 model taken from the TensorFlow model zoo but fails to be executed with the TRT executor.
python3 -m tf2onnx.convert --graphdef fgraph.pb --output frozen.onnx --fold_const --opset 10 --inputs image_tensor:0 --outputs num_detections:0,detection_classes:0,detection_scores:0,detection_boxes:0
As already stated, chances are that onnx models will have more success with upcoming TensorRT versions because of the more actively maintained onnx parser in contrary to the uff parser.
TensorFlow model => TRT engine
TRT is also built into the latest versions of TensorFlow and is called TF-TRT. TensorFlow models can be used, although this requires a hard dependency on TensorFlow itself, which is not really needed. Operations that are not TRT compatible will be executed by standard TensorFlow.
TensorFlow inference server
If you use the TensorRT inference server you will have an API to which you can request the TRT engine building progress of your custom models. I do not have experience with this, because it requires a hard dependency on the server and is hard to be used in your custom application. This server is also available in the Jetpack on the Jetson Nano.
As stated previously, every inference application can be run on a Jetson Nano with Jetpack installed but it is also possible to do this completely in Docker by using a Balena base image, “jetson-nano-ubuntu:bionic”. If we add everything together we’ll get the following components which are needed to be built within the Docker image to replace the NVIDIA Jetpack.
- NVIDA l4t
- NVIDA cuda
- NVIDA cudnn
- NVIDA nvinfer
- NVIDA tensorrt
All dependencies and packages are listed in the README. They are obtained by unpacking the NVIDIA Jetpack, which is automatically installed when you flash your Jetson Nano sd-card. This is needed to be able to run this image without any other dependencies than the standard Docker engine, which comes in useful if you want to deploy this on for example Balena.