Object Detection for a Robot using Deep Learning

Original article was published by Surajit Saikia on Deep Learning on Medium

Object Detection for a Robot using Deep Learning

In this post, we will enable a robot named Vector to detect and recognize a large number of objects. In the end, you will see how he mentions the objects that he detected.

Who is Vector?

Vector is a cute robot, who can be your companion, and is powered by AI. He is curious, independent and also he can make you laugh with his actions. After all, you can customize it with using AI, and we will see how to make this robot detect and recognize various objects in our day to day life.

Object Detection using Deep Learning

To detect objects, we will be using an object detection algorithm which is trained with Google Open Image dataset. The network consists of a ResNet with a Region proposal network and can detect more than 600 object categories. That means Vector will be able to identify a large number of objects. However, we have a few more dependencies to make Vector recognize those objects. The main dependencies are based on my testing platform using python 3.6, but you can change them according to the machine in which you will be implementing.

  1. Tensorflow — 1.12.0 (you can install both CPU or GPU version)
  2. Keras-2.2.4
  3. OpenCV3
  4. Vector SDK

Vector SDK

The Vector SDK gives access to various capabilities of this robot, such as computer vision, Artificial intelligence and navigation. You can design your programs to make this robot imbibed with certain AI capabilities. Before running the module, install the vector SDK by following the information on this page: https://developer.anki.com/vector/docs/index.html.

Running the Module

  1. Please clone or download this repository into your local machine. After downloading, you need to authenticate the vector robot so that the SDK can interact with Vector. To authenticate with the robot, type the following into the Terminal window.
  • python3 -m anki_vector.configure

Please note that the robot and your computer should be connected to the same network. Now, you will be asked to enter your robot’s name, IP address and serial number, which you can find in the robot itself. Also, You will be asked for your Anki login and password which you used to set up your Vector.

2. IF you see “SUCCESS!” then your robot is connected to your computer, and you can run this module by typing.

Note: Before running this module please download the pre-trained model from here, https://drive.google.com/file/d/113-3Kud0Ner2NR4ZiSOnQUOq5C4G16CC/view?usp=sharing, and put it inside the data folder.

  • python vector_objectDetection.py

Vector grabbed this picture of me posing, and he says:

I can detect Car, Computer monitor, Human face, Computer monitor, Wheel.

The picture was taken by Vector to detect objects

Now let us go through the coding part step by step

The code below recieves the picture taken by Vector and calls the object_detection module to detect and identify various objects. Once detected, the object names are send back to vector so that he can speak out.def get_classnames(image_path):
This function calls the object detection library to detect 600 objects
:param image_path:
:return: class labels
classes = object_detection(image_path)
if len(classes) == 0:
return 'no objects'
class_list = []
for class_names in classes:

print('Labels: {}'.format(classes))
return ', '.join(class_list)

except Exception as e:
print('Exception Handled', e)
  1. Import the libraries
"""Making Vector to detect 600 objects from Google OpenImageDataset"""
import time
from model.detect_odr import object_detection
from PIL import Image
import anki_vector
from anki_vector.util import degrees
import anki_vector.camera
robot = anki_vector.Robot(anki_vector.util.parse_command_args().serial)
screen_dimensions = anki_vector.screen.SCREEN_WIDTH, anki_vector.screen.SCREEN_HEIGHT
image_name = "detect.jpg"

2. Define functions to connect Vector and activate its camera.

def disconnect_robot():
print('Disconnected vector')
def display_camera():
def close_camera():
print('Close camera')
def save_image(file_name):
print('Save image')
image = robot.camera.latest_image.raw_image.save(file_name, 'JPEG')
def display_image(file_name):
print('display image = {}'.format(file_name))
image = Image.open(file_name)
screen_data = anki_vector.screen.convert_image_to_screen_data(image.resize(screen_dimensions))
robot.screen.set_screen_with_image_data(screen_data, 5.0, True)
def vector_speaks(text):
print('Vector: {}'.format(text))

3. Define a function to get the object labels and customize Vector to speak the way you want.

def detect():
vector_speaks('Hey, I am going to find some objects, and will tell you what I found')
vector_speaks('I will take a photo of this environment, and will analyze using my deep learning based brain')
vector_speaks('Wait a minute. I am trying to find some objects, I will let you know now.')
text = get_classnames(image_name)
vector_speaks('I can detect {}'.format(text))

You can see Vector will call out the object names if detected. This is just a test code to see I can use AI with the Vector SDK. In fact, it can be improved a lot to make Vector more customize.