Deep Learning Environment set-up: From a viewpoint of a Beginner…A Step by step Guide




This is the world of AI. Everyone is trying getting into this field. This article is for those who want to do their hands dirty and try something by themselves. This is a step by step guide to explain you how to start with any Deep learning project which covers the following steps:

i) Setting up an AWS GPU Instance

ii) Installing Docker and creating image

iii) Choosing your model, set up the prerequisites, start training and inference your model.

Let’s start with the first step:

Setting Up an AWS GPU Instance

GPUs, Graphics Processing Units, are specialized processors originally created for computer graphics tasks. Modern GPUs contain a lot of simple processors (cores) and are highly parallel, which makes them very effective in running some algorithms. First of all, we need to understand why we need a GPU, especially for Deep Learning. In each and every article related to deep learning we always read that deep learning model training requires a lot of computational efforts. That is true. But one can only train the model using GPU is not True. The model can be trained on your laptop. But training a model on your laptop is like this:

So, if you want not to spend too much time in getting your model done …. you need a GPU. A very good article on this is here. NVIDIA GPUs are considered to be best for Deep Learning models because almost every Deep Learning framework supports NVIDIA GPUs. There are two options of using NVIDIA GPUs. One is we can buy graphics card and install it on our laptop and then manually set it as the default processor. But this seems to be costly especially for the beginners. The next option is using the GPUs in virtual environment like AWS EC2(Amazon Cloud Elastic Compute) instance. For that all you need a valid AWS account. Below are the steps that we need to take to set up a GPU Instance on AWS:

  • Create an account on AWS cloud here. You can also AWS free Tier which includes 750 hours of Linux and Windows t2.micro instances each month for one year.
  • Launch an EC2 Instance
  • select ubuntu 16.04
  • select g2.xlarge — 8 vCPU, 15Gb RAM, 60GB startup SSD, 1 GPU K520 (Note: we can choose any other also depending on our requirements)
  • select availability zone (Note that price varies according to the zone)
  • Protect against accidental termination (optional)
  • add storage — 120 GB (Depending on requirement. But if you are still in learning phase, 120 GB is more than enough)
  • add tags such as name and env…
  • select security group
  • launch and choose key

You will get a SSH key which you have to use every time to connect to the AWS instance from terminal. And you find this key in your downloads or your preferred directory. And you are done with setting the virtual environment for your deep learning model. The next step is to install NVIDIA GPU in this virtual environment. The steps are the following:

Connect to the instance

Navigate to the directory/downloads where you have stored your SSH key and use below command to connect to your instance in terminal:

$ ssh -i “your_ssh_key.pem” ubuntu@[your instance public IP address]

Now you are connected to virtual environment from your terminal.

Installing NVIDIA drivers

$ sudo apt-get update
$ sudo apt-get upgrade

Installing CUDA

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
$ sudo apt-get install cuda
$ sudo apt install nvidia-cuda-toolkit
$ nvidia-smi
Your screen should look like this after the last command

Installing Docker and creating image

$ nvidia-docker run — rm nvidia/cuda nvidia-smi
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository    "deb [arch=amd64] https:/download.docker.com/linux/ubuntu    $(lsb_release -cs)    stable"
$ sudo add-apt-repository    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$ sudo apt-get install docker-ce
$ sudo docker version

Your screen should provide the version information about the docker along with many other information.

$ wget https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
$ sudo dpkg -i nvidia-docker_1.0.1-1_amd64.deb
systemctl list-units --type=type=service | grep -i nvidia-docker
$ sudo adduser ubuntu docker
$ nvidia-docker run — rm nvidia/cuda nvidia-smi
$ nvidia-docker run --rm nvidia/cuda:9.1-cudnn6-runtime-ubuntu16.04 
$ nvidia-docker images

This will give the detail of all docker images created.

Let’s define the port also.

$ nvidia-docker images
$ mkdir .docker #create the docker path
$ mkdir .docker/tf_models #create a directory inside the docker to run the operation)
$ nvidia-docker run -it -p 8888:8888 -p 6006:6006 -v /home/ubuntu/.docker/tf_models:/workspace -w /workspace nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
$ nvidia-docker ps -a #this will list all the dockers with their id
$ nvidia -docker start 3dcc4e423096 
$ nvidia -docker attach 3dcc4e423096

Great… all set to start with your first deep learning model training ! !

cuDNN

Now, register an account on NVIDIA’s Accelerated Computing Developer Program and download cuDNN (latest version) to your local machine. In my case this is cuDNN 9.0. Upload all the files on AWS instance using the following commandUpload the files on AWS instance and install them in docker with the help of following commands

$ scp -i ~/your_ssh_key.pem ~/folder_tar_file/cudnn-9.0-linux-x64-v5.0-ga.tgz ubuntu@[your instance public IP address]:/home/ubuntu/.docker/tf_models

Connect the instance and untar the folder:

$ tar -zxvf cudnn-9.0-linux-x64-v5.0-ga.tgz

Install them in Docker container with the help of following command:

$ dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb
$ dpkg -i libcudnn7-dev_7.0.5.15-1+cuda9.0_amd64.deb
$ dpkg -i libcudnn7-doc_7.0.5.15-1+cuda9.0_amd64.deb
$ apt-get install bzip2 git curl wget vim protobuf-compiler libgomp1 build-essential unzip -y

Set up the prerequisites, choosing your model, start training and inference your model

Setup the Prerequisites:

Download Anaconda on the EC2 instance and install it:

$ wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
$ bash Anaconda3-4.4.0-Linux-x86_64.sh

Reload the Source

source ~/.bashrc

Choosing the model

For the purpose of understanding let’s take an example of building an Object Detection model. For this purpose, we will look into the open source libraries. Fortunately, we are living the world of knowledge sharing. Google’s Tensorflow is one of the best open source library which not only provides the the best framework for deep learning framework but also the trained model zoo for different purpose. Lets clone the tensorflow models git repository in our docker:

$ git clone https://github.com/tensorflow/models.git
# change directory to research
$ cd models/research
# test the set up
$ python object_detection/builders/model_builder_test.py

Install the dependencies:

$ pip3 install tensorflow-gpu
$ sudo apt-get install protobuf-compiler python-pil python-lxml python-tk
$ sudo pip3 install Cython
$ pip3 install jupyter
$ pip3 install matplotlib
#Replace pip3 with pip, if using python 2

Let’s re-train our object detection on MSCOCO dataset. For this purpose, we have to install coco API and install that:

$ cd ../
$ git clone https://github.com/cocodataset/cocoapi.git
$ cd cocoapi/PythonAPI
$ make
$ cp -r pycocotools /workspace/models/research/

Protobuf Compiler- from workspace directory, run the following command:

$ mkdir protoc_3.3 # depending upon version of protoc using
$ cd protoc_3.3
$ wget https://github.com/google/protobuf/releases/download/v3.3.0/protoc-3.3.0-linux-x86_64.zip
$ chmod 775 protoc-3.3.0-linux-x86_64.zip
$ unzip protoc-3.3.0-linux-x86_64.zip
$ cd /workspace/models/research/
$ /workspace/protoc_3.3/bin/protoc object_detection/protos/*.proto --python_out=.

Starting working with TensorFlow Object detection API- run the following command from research directory:

$ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
$ cd object_detection
$ python setup.py build
$ python setup.py install

Test the installation, from research directory, run the following command

$ python object_detection/builders/model_builder_test.py

Download the dataset and prepare for training model: from research directory or object_detection directory, run the following command:

$ mkdir mscoco_dataset_2017
$ cd mscoco_dataset_2017
$ wget http://images.cocodataset.org/zips/train2017.zip
$ wget http://images.cocodataset.org/zips/val2017.zip
$ wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
$ wget http://images.cocodataset.org/zips/test2017.zip
$ unzip train2017.zip
$ unzip val2017.zip
$ unzip annotations_trainval2017.zip
$ unzip test2017.zip

Tensorflow Object Detection Model takes the dataset in a special format that is known as tfrecord format. So we have to convert the dataset in tfrecord format.


$ cd ../
$ git clone https://github.com/MetaPeak/tensorflow_object_detection_create_coco_tfrecord.git
$ cd /workspace/coco/PythonAPI
$ cp /workspace/tensorflow_object_detection_create_coco_tfrecord/create_coco_tf_record.py .
$ python create_coco_tf_record.py --data_dir=/workspace/models/research/object_detection/mscoco_dataset_2017/train2017 --set=train --output_filepath=/workspace/models/research/object_detection/mscoco_dataset_2017mscoco_train.record --shuffle_imgs=True
$ python create_coco_tf_record.py --data_dir=/workspace/models/research/object_detection/mscoco_dataset_2017/val2017 --set=val --output_filepath=/workspace/models/research/object_detection/mscoco_dataset_2017mscoco_val.record --shuffle_imgs=True

Now let’s start the training: Download the pretrained model and untar the same. Here, we are trying to retrain SSD Mobilenet. You can choose any model from the model Zoo.

$ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
$ cd ssd_mobilenet_v2_coco_2018_03_29.tar.gz
tar -xvf ssd_mobilenet_v2_coco_2018_03_29.tar.gz

Lets copy all the required files to one directory(optional)

$ cd workspace/models/research/object_detection
$ mkdir my_ssd_mobilenet_v2_coco_2017
$ cd my_ssd_mobilenet_v2_coco_2017
$ cp -r /workspace/ssd_mobilenet_v2_coco_2018_03_29 . # model checkpoint
$ cp /workspace/models/research/object_detection/mscoco_dataset_2017mscoco_train.record . # mscoco_train.record
$ cp /workspace/models/research/object_detection/mscoco_dataset_2017mscoco_val.record . # ms_coco val.record
$ cp /workspace/models/research/object_detection/data/mscoco_label_map.pbtxt .

Set the pipeline config: Edit the pipeline config file, we need to change the path of four files: “fine tune model check-point", "mscoco_train.record", "mscoco_val.record" and "ms coco label map"

$ vi ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config/

Start Training:

$ python train.py --logtostderr --train_dir=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/train_check_point/ --pipeline_config_path=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config
# or if you want to train your model in background:
$ nohup python train.py --logtostderr --train_dir=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/train_check_point/ --pipeline_config_path=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config>| train.out &
$ less train.out #put your training in foreground
$ q # take the job in background

For Evaluation:

$ python eval.py --logtostderr --checkpoint_dir=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/train_check_point/ --eval_dir=workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/eval_check_point/ --pipeline_config_path=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config
# or if you want to train your model in background:

$ nohup python eval.py --logtostderr --checkpoint_dir=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/train_check_point/ --eval_dir=workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/eval_check_point/ --pipeline_config_path=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config>| eval.out &
$ less eval.out #put your evaluation in foreground
$ q # take the job in background

Looking the TENSORBOARD:

$ tensorboard --logdir=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017 >| tensorboard.out &

Export the Inferences: Once the model is trained, export the inferences. From object detection directory, run the following commands:

$ python export_inference_graph.py --input_type image_tensor --pipeline_config_path=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/ssd_mobilenet_v2_coco_2018_03_29/ssd_mobilenet_v2_coco_2018_03_29/pipeline.config --trained_checkpoint_prefix=workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/train_check_point/model.ckpt-xxxxxxx --output_directory=/workspace/models/research/object_detection/my_ssd_mobilenet_v2_coco_2017/fine_tuned_model

You can run the jupyter notebook to inference your image. A step-by-step guide of this process is also available here.

Source: Tensorflow/models

Source: Deep Learning on Medium