Object detection and Deep Learning: Identify Google Streetview image content using YOLO in R

Original article was published by Hany IMAM on Deep Learning on Medium


Object detection and Deep Learning: Identify Google Streetview image content using YOLO in R

Google Street View is a technology featured in Google Maps and Google Earth that provides interactive panoramas from positions along many streets in the world. It was launched in 2007 in several cities in the United States, and has since expanded to include cities and rural areas worldwide. Streets with Street View imagery available are shown as blue lines on Google Maps.

Google Street View displays panoramas of stitched images. Most photography is done by car, but some is done by tricycle, boat, snowmobile, and underwater apparatus, as well as on foot.

Scraping images

Below is the simple script that I have used to scrape image from Google Streetview:

jpeg(file="/Users/your/file/location/Sanaa1.jpeg")
google_streetview(location = c(15.3168264, 44.2289662),
size = c(400,600),
panorama_id = NULL,
output = "plot",
heading = 90,
fov = 90,
pitch = 0,
response_check = FALSE,
key = key)
dev.off()
Google Streetview extract of a street in Sana’a — Yemen

If you are intending to use multiple images at once, you can run a “for” loop that will scrape multiple images from different locations.

Deep Learning: Object Detection — Why YOLO?

You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev: https://pjreddie.com/darknet/yolo/

Darknet YOLO model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN.

Below is the 3 lines script that will be implemented on our extracted image to identify its contents:

library(image.darknet)
#Import YOLO Tiny Model
yolo_tiny_voc <- image_darknet_model(type = 'detect',
model = "tiny-yolo-voc.cfg",
weights = system.file(package="image.darknet", "models", "tiny-yolo-voc.weights"),
labels = system.file(package="image.darknet", "include", "darknet", "data", "voc.names"))
#Image Detection
x <- image_darknet_detect(file="/Users/your/file/location/Sanaa1.jpeg",
object = yolo_tiny_voc,
threshold = 0.12)
Labels have been added to our image

Changing The Detection Threshold

By default, YOLO only displays objects detected with a confidence of .25 or higher. You can change this by passing the -thresh <val> flag to the yolo command. For example, to display all detection you can set the threshold to 0.

Training YOLO on VOC

You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here’s how to get it working on the Pascal VOC dataset.

Get The Pascal VOC Data

To train YOLO you will need all of the VOC data from 2007 to 2012. You can find links to the data here. To get all the data, make a directory to store it all and from that directory run: