Custom object detection for non-data scientists

Source: Deep Learning on Medium


Go to the profile of Italo José

Using the Tensorflow object detection API to train a model with your own dataset.

What you can do at the end of this article

In general terms, at the end of this tutorial you basically will be able to pick up your dataset, load it on jupyter notebook, train and use your model :)
The picture above is the result in the example that we are going to play here.

WARNING

This article is just to call you to learn about TF object detection API, but this content is just a poor copy-past from my jupyter notebook, if you want to see something more structured and beautiful, go to my jupyter notebook that there have the same texts that have here, but in a more beautiful and easy to understand that medium :)


To begin with, let’s install the dependencies

!pip install pillow
!pip install lxml
!pip install Cython
!pip install jupyter
!pip install matplotlib
!pip install pandas
!pip install opencv-python
!pip install tensorflow

Downloading the Tensorflow Object detection API

First of all, let’s download the tensorflow models repository, 
inside this repository has the objection detection api, that we will 
use to train our ownobject detection model. 
Everything that we will do, is inside the path models/research/object_detection.

!git clone https://github.com/tensorflow/models/
%cd models/research/object_detection

Here we’re just creating some folders that we will use it later.
The command mkdir creates directories

!mkdir training
!mkdir inference_graph
!mkdir -p images/train
!mkdir -p images/test

Choosing our pre-trained model.

Inside the the tensorflow zoo models we can choose a pre-trained 
model to dowloand and use it to train our own dataset.
Inside the tensorlfow zoo models repository’s folder, we have a table 
that explain how precise the model is (with mAP — mean Average Precision )
and how faster this model is.
You can choose any model that you want, the process in this tutorial for others
models is the same.
For this tutorial, I choosed the faster_rcnn_inception_v2_coco_2018_01_28, just
because I want :)

!wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
!tar -xvzf faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
!rm -rf faster_rcnn_inception_v2_coco_2018_01_28.tar.gz

According to the documentation, it is important that we export the PYTHONPATH environment variable with the models, reasearch and slim path

import os
os.environ['PYTHONPATH'] = "{}/content/obj_detect_api/models:/content/obj_detect_api/models/research:/content/obj_detect_api/models/research/slim".format(os.environ['PYTHONPATH'])

Compiling something

Here we have some proto buffers that need to be compiled, remember 
that it’s compiled, so if you switch machines, you can’t just copy and paste
the files generated by this guy.
Being sincere I couldn’t execute these protos from the “/research/object_detection” 
folder, I tried N ways and it didn’t work, so I simply compiled them from within the 
“/research” folder.
to understand what’s proto buffers is not necessary for this tutorial, but if you want
to learn more about, I recommend you look the documentation, basically it’s text
structures (such as json, xml) containing message structures that you can transform
it to some languages, is a little more than that, but for now that’s more than enough 
for you to follow that article.

%cd ..
!protoc ./object_detection/protos/*.proto --python_out=.

I don’t remember the exact version that the Tensroflow Objection detect API
require, but I know from version >= 3.0 works well.

!protoc --version

So just install it

!python3 setup.py build
!python3 setup.py install

Test if the instalation is working.

Once we have installed everything, we can run some sample scripts from the
Tensorflow Object detection API to verify if everything is correct.
this code isn’t mine, I copied it from object_detection folder just to make some
modifications ro run it on jupyter notebook.

I don’t will explain this code because it’s is justo to test the instalation, we will 
see a similar code later.

Just execute these cells.

Warning: Just certify that you are using the Tensorflow >=1.12.0

Obs: Remember that you can leave this article and go to the jupyter notebook, which is better :)

If after you execute the cell bellow you can see a dog image and a beach image, everything is working!

TTraining the model with our own dataset.

First of all let’s download our dataset.

About the dataset

In this article we will use a dataset from kaggle, but don’t worry, during this tutorial I will direct 
you to another articl that teaches you how to create your own classified dataset.

Pascal-VOC

Before talking about the dataset, I want to just to remember you that in this tutorial we will use 
datasets in the Pascal-VOC format, this is famous format where you have:

  • Images in jpg, jpeg, png format …
  • Annotations: .xml files in the following format:
<annotation>
<folder>GeneratedData_Train</folder>
<filename>000001.png</filename>
<path>/my/path/GeneratedData_Train/000001.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>224</width>
<height>224</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>21</name>
<pose>Frontal</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>82</xmin>
<xmax>172</xmax>
<ymin>88</ymin>
<ymax>146</ymax>
</bndbox>
</object>
</annotation>

For each image, we have an .xml with the same name, example: image001.png -> image001.xml
Note inside this .xml we have others informations about our image like location, size, objects and 
their locations inside that image…

The dataset we are using in this article is the LISA Traffic Light Dataset extracted from kaggle, this 
dataset contains images and classifications of traffic lights with the following classes:

  • go;
  • stop;
  • warning;
  • goLeft;
  • goForward;
  • stopLeft;
  • warningLeft;

But for this tutorial to be simpler, I modified the classes so that we only have:

  • go;
  • stop;
  • warning;

After the dowloand, note that we are moving everything into the folder … / images

%cd object_detection
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=15WlpBbq4EpxUxZeKEAbfI_YJABASpmFs' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=15WlpBbq4EpxUxZeKEAbfI_YJABASpmFs" -O lisa608.zip
!unzip -qq ./lisa608.zip
!mv images_output/* images/
!mv annotations_output/* images/

For this simple tutorial, I do not want to work with many classe, you can do this, but to simplify the process here I will rename all the classes that have “Left” and “Forward” at the end

Here I just split my datset in train and test

If your annotations are in Pascal-VOC format (as is our case) you will need to convert them into csv.

I know, our original dataset was already in csv before, but I’ve converted it to xml just to show you this scenario.
Here we iterate about each folder, train, test and validation (if we had data for validation) so we extracted the data:

  • filename
  • width
  • height
  • class
  • xmin
  • ymin
  • xmax
  • ymax

and put it on a line from our csv.

Generating TFRecords

To use TF Object detection API we will need to follow the TFRecord input format.

When we are working with a lot data, it is important to work with a format that is light and fast, one 
option is to work with the document’s binary, which is exactly what TFRecords does, but in addition 
it is optimized for working with Tensorflow because it was created for Tensorflow, for example when 
you are working with a very large dataset and try to load it in memory, obviously you will not get it 
because you don’t have memory RAM enought to do it, so you will have to work with batches, but 
if you use the the TFRecords you don’t need works with batches , it abstracts for you how to load 
the data into memory without you having to program itself.

Here there is an article that explains more about how TFRecords works.

For each project, some changes must be made in the class_text_to_int () method, note 
that there is a very simple logical structure where we have to return an integer depending on the
class that is

Obs: you can see the whole code from the jupyter notebook.

We will also need to input our classes in TF. To do it we will create a .pbtxt file with the following structure:

item {
id: 1
name: 'stop'
}

For each class we have an item, for each item we have an id and a name, the Id refers to the id that we use in our TFRecords:

def class_text_to_int(row_label):
if row_label == 'stop':
return 1
elif row_label == 'warning':
return 2
elif row_label == 'go':
return 3
else:
None

remember to use the same ids.

%%writefile training/labelmap.pbtxt
item {
id: 1
name: 'stop'
}
item {
id: 2
name: 'warning'
}
item {
id: 3
name: 'go'
}

Here we will print the lenth of image folder to use this information later..

import os
test_path = "images/test"
len_dir = len(os.listdir(test_path))
print("{} imagens inside the {}".format(len_dir, test_path))

The next cell has a file is a configuration file of the neural network, here we have some hyper-paramétros 
that we must to modify .

I copied this configuration file from faster_rcnn_inception_v2_pets.config and edited it with my respective
modifications, for each deep learning architecture that you will use, you need a different configuration file, 
you can find all of them here [...]/research/object_detection/samples/configs or if you prefer, 
access via github.

Within these files, some things should be changed, such as: num_classes

faster_rcnn {
num_classes: 3

fine_tune_checkpoint

fine_tune_checkpoint: "/home/<full_path>/research/object_detection/faster_rcnn_inception_v2_coco_2018_01_28/model.ckpt"

inside the property train_input_reader, change the input_path and the label_map_path to:

tf_record_input_reader {
input_path: "/<full_path>/research/object_detection/train.record"
}
label_map_path: "/<full_path>/research/object_detection/training/labelmap.pbtxt"
}

Remember that the train_input_reader must to be the train.record file.

Inside the eval_config property, change the number of instances(images) you have to test (inside the …/images/test folder) 
in the num_examples property:

eval_input_reader: {
tf_record_input_reader {
input_path: "/full_path/research/object_detection/test.record"
}
label_map_path: "/full_path/research/object_detection/training/labelmap.pbtxt"

Note that I am using test.record instead of train.record

Obs: just to remember that you can see the code and already run it all through google colabotory

%cd ..
!protoc ./object_detection/protos/*.proto --python_out=.
%cd object_detection

I’m sorry for that, but I was getting a very annoying error where it said that slim lib was not being found, 
many people said the problem was in my PYTHONPATH environment variable, but I verified a thousand times 
and did not find the problem, so I solved it by copying all the /research/slim code into the /object_detection
folder, if anyone identifies what is being done wrong, please comment here below.

!cp -a ../slim/. .

Let’s train it!

%run legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2.config

After training your model, note that inside the /training folder we have a model.ckpt-22230.data-00000-of-00001 file, probably on your 
machine will have another number, this file is your saved model of your training!

!ls training

As we have copied the whole /slim folder to the /object_detection, we had overwrited the 
inference_graph.py file, so here we re-created the file.

To learn more about inferences, I recommend these articles that are very good. article 1 Article 2

Let’s run our inference! Remember to change the trained_checkpoint_prefix property to use the model.ckpt-XXX that is inside the / training folder

!python3 my_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/faster_rcnn_inception_v2_pets.config \
--trained_checkpoint_prefix training/model.ckpt-22230 \
--output_directory ./inference_graph

We will dowload some test image here to see some results from our model.

!wget 'http://marcusquintella.sigonline.com.br/openged/conteudos/1306/001306_59bc593899b85_Semaforos_out_2017.jpg' -O semaforo.png

This code is very similar to the test code!

import os
import cv2
import numpy as np
import tensorflow as tf
import sys
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

Let’s define some constants and some paths, checkpoint of our model, our labels and more.

# Import utilites
from utils import label_map_util
from utils import visualization_utils as vis_util
# Name of the directory containing the object detection module we're using
MODEL_NAME = 'inference_graph'
IMAGE_NAME = 'semaforo.png'
# Grab path to current working directory
CWD_PATH = os.getcwd()
# Path to frozen detection graph .pb file, which contains the model that is used
# for object detection.
PATH_TO_CKPT = os.path.join(CWD_PATH,MODEL_NAME,'frozen_inference_graph.pb')
# Path to label map file
PATH_TO_LABELS = os.path.join(CWD_PATH,'training','labelmap.pbtxt')
# Path to image
PATH_TO_IMAGE = os.path.join(CWD_PATH,IMAGE_NAME)
# Number of classes the object detector can identify
NUM_CLASSES = 3

here we load our label_maps so that we can make the model prediction depature, for example, 
know that the number 3 corresponds to the class “go”

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

Let’s load our model and select some layers from our model

# Load the Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
sess = tf.Session(graph=detection_graph)
# Define input and output tensors (i.e. data) for the object detection classifier
# Input tensor is the image
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Output tensors are the detection boxes, scores, and classes
# Each box represents a part of the image where a particular object was detected
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represents level of confidence for each of the objects.
# The score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
# Number of objects detected
num_detections = detection_graph.get_tensor_by_name('num_detections:0')

below we load the image and run our model, if you do not know anything about Tensorflow, I suggest you
read this article (this is a quick article) and read the documentation if you would like more details on how it 
works for a session within the tensorflow.

Some parameters are configurable here, such as line_thickness that define the width of your line and the 
min_score_thresh that corresponds to the percentage of confidence you want to have Tensoflow say: hey, 
has an object here with more than X% confidence. (in our case we will use 0.6)

Load image using OpenCV and
# expand image dimensions to have shape: [1, None, None, 3]
# i.e. a single-column array, where each item in the column has the pixel RGB value
image = cv2.imread(PATH_TO_IMAGE)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_expanded = np.expand_dims(image, axis=0)
# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_expanded})
# Draw the results of the detection (aka 'visulaize the results')
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.6)

Let’s print our image with the respective classifications. In a real application it is not necessary to use visualize_boxes_and_labels_on_image_array (), you can use the boxes, classes and scores separately.

%matplotlib inline
plt.figure(figsize=(20,10))
plt.imshow(image)

and here we have the original image

image = cv2.imread(PATH_TO_IMAGE)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(20,10))
plt.imshow(image)

Now you can download this inference and use it on your machine, on the server, wherever you want

!zip -r inference_graph.zip /content/obj_detect_api/models/research/object_detection/inference_graph

So send it to google drive.

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
model_file = drive.CreateFile({'title' : 'inference_graph.zip'})
model_file.SetContentFile('./inference_graph.zip')
model_file.Upload()
# download to google drive
drive.CreateFile({'id': model_file.get('id')})