Source: Deep Learning on Medium
This article is a step by step guide on how to use the TensorFlow object detection APIs to identify particular classes of objects in an image.
Please check the Part 1 which describes how to setup your Tensorflow environment for object detection on Ubuntu 16.04.
Feel free to use windows, Google Colab or any cloud service on which you have setup Tensorflow for object detection but you may have to adapt the codes on this post to work in your environment.
In this tutorial, we will work you through the training of a pedestrian (person) detector model using transfer learning. The steps followed are:
- Gathering & labeling of data (pictures)
- Creating TFRecords files
- Training and Evaluating the model
- Exporting Inference Graph
Gathering & labeling of data (pictures)
Given that your environment has already been set, the next thing to do is to prepare your training data set (pictures). Tensorflow need hundreds (thousands is better) of images of various background to train a model with a good precision. Our training data set need to be separated in two folders (train and test). To do that, move to your models/research/object_detection directory and create a folder called images containing two sub folders (train and test).
Note: Make sure all your images are .jpg images. The usage of images with other extensions may lead to problems during training.
Move 80% of the images to train folder and 20% to test folder (both train and test folders are in images directory). For this tutorial, we got our images data from INRIA Person Dataset & Penn-Fudan Database which we later converted to .jpg images.
Now, we need to label our images. We will use a simple and good tool called labelImg which you can download and use freely.
The process of labeling the images consist of creating a rectangle box on the target objects as shown in the picture above. Click on “Create RectBox” and draw the box on the target image, label the object targeted and click on save when done. You have to repeat this process for all your images in the train and test directories.
You can download the labeled dataset by clicking here.
Creating TFRecords files
After labeling all the images and saving their corresponding XML files, we need now to generate a tfrecord file for both the train and test images. To generate the tfrecord files, we need first to create the .csv files from the XML files created during labeling.
To do this, we need two scripts (xml_to_csv.py and generate_tfrecord.py) available at Dat Tran’s raccoon detector. Download this scripts and move them to models/research/object_detection directory, where your images directory is found.
Generating CSV files
Update the the main method of the xml_to_csv.py as below:
From here, create the .csv files by running the command below from models/research/object_detection directory
This will create train_labels.csv and test_labels.csv files in the images directory.
Generating tfrecord files
To create tfrecord files from these .csv files generated above, we need to update the generate_tfrecord.py as below
# TO-DO replace this with label map
if row_label == 'door':
elif row_label == 'floor':
elif row_label == 'stairecase':
elif row_label == 'window'':
# TO-DO replace this with label map
if row_label == 'person':
If you are not using our dataset, change the class name to match your dataset labels.
Now create the tfrecords files by running the command below from model/research/object_detection directory
python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=images/train.record
python generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=images/test.record
This creates train.record and test.record files in images directory.
We need to configure how our training will work before we can move forward. Create a folder called training in the model/research/object_detection directory. This folder will contain our training config files necessary for this training.
Creating the label map
This is a file which maps an id to name. Create a file named labelmap.pbtxt in the training folder created above and add the text below to it.
The id number of each item should match the id specified in the generate_tfrecord.py file.
Prepare for transfer learning
Our training process make use of transfer learning which is the usage of an already trained model to train on your data. This make the training process to take less time and usually produce better results.
For this task we’ll use Single Shot Detector(SSD) with MobileNet (model optimized for inference on mobile) pretrained on the COCO dataset called ssd_mobilenet_v2_quantized_coco. First, We will download and extract the latest checkpoint that’s been pre-trained on the COCO dataset. Run the command below from object_detection directory
curl -O http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz
When we train our model, it’ll use these checkpoints as its starting point for training.
Create training config file
Since, we have chosen to train from ssd_mobilenet_v2_quantized_coco model, we will use as config file ssd_mobilenet_v2_quantized_300x300_coco.config which can be found at sample/config folder in the object_detection folder. We copy this file to the training folder and update as follows:
Line 9: change the number of classes to 1 since we are detecting just one object
Line 156: Update fine_tune_checkpoint to the path of model.ckpt
# for my case
Line 175: Update the path of the train.record file
# for my case
Line 177 & 191: update the path to the path of labelmap.pbtxt
# for my case
Line 189: Update the path of the test.record file
# for my case
Add the line below to the eval_config if you want to use the coco evaluation metrics
Line 181: Update the number of examples to the number of images in your test directory.
# for my case
In this project, we decided to use COCO evaluation metrics to evaluate the accuracy of our model during training. So we installed COCO APIs as follows:
git clone https://github.com/cocodataset/cocoapi.git
cp -r pycocotools <path_to_tensorflow>/models/research/
Now its time to train the model. To train our model we will copy the train.py file from object_detection/legacy folder to the object_detection folder. Run command below from the model/research/object_detection folder.
python train.py --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v2_quantized_300x300_coco.config
If everything was properly configured, your training should start in few seconds
Now, you can open a new terminal and navigate to the object_detection directory. Copy the eval.py file from object_detection/legacy folder to the object_detection folder. Run the following command to start the training evaluation.
#Start the evaluation
python eval.py --logtostderr --pipeline_config_path=training/ssd_mobilenet_v2_quantized_300x300_coco.config --checkpoint_dir=training/ --eval_dir=training/
Open the TensorBoard by opening a new terminal and navigate to the object_detection folder . Here run the command
This will serve a web page accessible at http://localhost:6006
Now click on the Images tab
On the images above, the rectangle boxes shows the person detected by the model and the accuracy of model.
Note: The training process is lengthy process especially when your computer has no GPU. With no GPU, the training can take up to about 24 hours(or more) to have good results.
Allow the training process to move on until you are OK with mAP (Mean average precision) and the loss. mAP measures the model’s percentage of correct predictions. Stop the process by typing CTRL+C
Export Inference Graph
Now that we have successfully trained our model, we need to generate the inference graph. The inference graph is the file used by applications who want to run our model. To do that, we need to move to the training folder and check the save summary file with the highest step number. The summary files are usually named as
model.ckpt-XXXXwhere XXXX is the step number. When we got it, we run the following code from the object_detection folder to generate the inference graph
python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v2_quantized_300x300_coco.config
--trained_checkpoint_prefix training/model.ckpt-XXXX --output_directory inference_graph
XXXX represents the highest step number.
This create an inference graph file with name
frozen_inference_graph.pb in the inference_graph folder. You can use this file to run our trained model in your various applications
TensorFlow is provides a suitable framework to train your own model. What was done here is just a tip of the iceberg, much more can be done with Tensorflow.
If you are interested in Mobile, check how this trained model can be brought to mobile applications by converting it to a tensorflow lite file.
Check how to bring this inference graph on a mobile device by clicking here