Vision-based moon-tracker

Original article was published on Deep Learning on Medium

Vision-based moon-tracker

A complete guide to help you to create a moon tracker (image & video) based on Mask-RCNN. Change the dataset, mark the object of your choice, you are ready to create your own Object-tracker ! No need to wander around for your first custom tracker creation.

moon-tracker frame by frame (Image by author)

The Why part (of the tracker)

We all at some point or the other must be fascinated by the beauty of the moon. The biggest and the brightest object in the night sky.

There are many apps to track the movement of the moon. Nothing is unpredictable here about the moon. In this perspective, the moon is an ideal candidate for an easy tracker.

But, now in a broader sense, an intelligent object tracker can be in a fixed position or mounted in a moving vehicle or even a satellite. The field of vision of your camera may be different. If the object moves, then ideally your algorithm should give accurate PTZ command (directions, and rate of movement) to the camera device to track the object of interest. It’s complex.

Let’s now simplify it.

Two things are important from the algorithm perspective :

  1. Accuracy of detection and localization of the (moving) object over time (in real-time).
  2. Speed / Frame rate processing. Else by the time you detect, the object might have reached somewhere beyond your (camera’s) reach.

What are we going to do part

Since this is more of a technical article, I am just sharing the crux of the project to help you to create a moon-tracker (or any custom object of your choice) step by step. Assuming you have basic knowledge of mask-RCNN, even if you don’t, you still can follow this one.

A simple moon-tracker based on Mask-RCNN.

Now, the How part

Step 1: Setup the environment and load the necessary packages.

Using Google colab is optional. It is a good option:

  1. If data is not sensitive
  2. If you have a lot of data
  3. If you don’t have a local GPU
from google.colab import drive
drive.mount('/content/drive/')
import os
os.chdir("drive/My Drive/Colab Notebooks/moon-tracker/")
---------------------------------------------------------from mrcnn.config import Config
from mrcnn import model as modellib
from mrcnn import visualize
import mrcnn
from mrcnn.utils import Dataset
from mrcnn.model import MaskRCNN
import numpy as np
from numpy import zeros
from numpy import asarray
import colorsys
import argparse
import imutils
import random
import cv2
import os
import time
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from keras.models import load_model
%matplotlib inline
from os import listdir
from xml.etree import ElementTree

You have the option to keep your data at the drive and mount to your environment like above. Also, you have the option to upload at the session directly, but it will be only available until the session ends.

Step 2: Keep your training and test dataset ready.

i.e your images and corresponding object masks.

Well, if you don’t have, for creating masks and having it as an XML file, you can use Labellmg.

(Image by author)
(Image by author)

The train and test set reading is based on the way you keep and generate the XML files. Refer the code if you kept like the below structure.

moon-tracker

— — — — — moon_mask_full

— — — — — — — — — — — images (all images jpg/png etc)

— — — — — — — — — — — annotations (XML files of the image mask)

class MoonDataset(Dataset):
# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):

# Add classes. We have only one class to add.
self.add_class("dataset", 1, "moon")

####################################################################
# define data locations for images and annotations
images_dir = 'moon_mask_full/images/'
annotations_dir = 'moon_mask_full/annotations/'
####################################################################

annot_list = []
for annot in listdir(annotations_dir):
annot_list.append(annot.split('.')[0])

# Iterate through all files in the folder to
#add class, images and annotaions
for filename in listdir(images_dir):


# extract image id for all formats like jpg / png / jpeg
image_id = filename.split('.')[0]
#print(image_id)

# There can be a chance that you don't created xml file for all the images.
# To filter images which have corresponding mask XML file.

if image_id != '' and image_id in annot_list:
# setting image file
img_path = images_dir + filename
#print(img_path)
# setting annotations file
ann_path = annotations_dir + image_id + '.xml'
#print(ann_path)
# adding images and annotations to dataset
self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

# extract bounding boxes from an annotation file
def extract_boxes(self, filename):

# load and parse the file
tree = ElementTree.parse(filename)
# get the root of the document
root = tree.getroot()
# extract each bounding box
boxes = list()
for box in root.findall('.//bndbox'):
xmin = int(box.find('xmin').text)
ymin = int(box.find('ymin').text)
xmax = int(box.find('xmax').text)
ymax = int(box.find('ymax').text)
coors = [xmin, ymin, xmax, ymax]
boxes.append(coors)

# extract image dimensions
width = int(root.find('.//size/width').text)
height = int(root.find('.//size/height').text)

#print(boxes, width, height)
return boxes, width, height

# load the masks for an image
"""Generate instance masks for an image.
Returns:
masks: A bool array of shape [height, width, instance count] with
one mask per instance.
class_ids: a 1D array of class IDs of the instance masks.
"""
def load_mask(self, image_id):
# get details of image
info = self.image_info[image_id]
#print(info)
# define anntation file location
path = info['annotation']

# load XML
boxes, w, h = self.extract_boxes(path)

# create one array for all masks, each on a different channel
masks = zeros([h, w, len(boxes)], dtype='uint8')

# create masks
class_ids = list()
for i in range(len(boxes)):
box = boxes[i]
row_s, row_e = box[1], box[3]
col_s, col_e = box[0], box[2]
masks[row_s:row_e, col_s:col_e, i] = 1
class_ids.append(self.class_names.index('moon'))
return masks, asarray(class_ids, dtype='int32')

# load an image reference
"""Return the path of the image."""
def image_reference(self, image_id):
info = self.image_info[image_id]
#print(info)
return info['path']

Step 3: Download and keep the COCO weights ‘mask_rcnn_coco.h5’ in the master folder.

Link: https://github.com/matterport/Mask_RCNN/releases

#load the weights for COCO
model.load_weights('mask_rcnn_coco.h5',
by_name=True,
exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

Step 4: Tweak the training and learning parameters, to fine-tune the model.

Play with config, learning_rate, epoch, and layers (‘all’, ‘3+’, ‘4+’, ‘heads’ ) for better accuracy.

class moon_detector_Config(Config):
# give the configuration a recognizable name
NAME = "moon_detector_Config"

# set the number of GPUs to use along with the number of images
# per GPU
GPU_COUNT = 1
IMAGES_PER_GPU = 1

# number of classes (we would normally add +1 for the background)
# moon + BG
NUM_CLASSES = 1+1

# Number of training steps per epoch
STEPS_PER_EPOCH = 50

# Learning rate
LEARNING_RATE=0.001

# Skip detections with < 90% confidence
DETECTION_MIN_CONFIDENCE = 0.95

# setting Max ground truth instances
MAX_GT_INSTANCES=1

# Maximum instances in a frame. Only one moon possible.
# But can have different number based on the object example cat, dog, bus, car etc.
DETECTION_MAX_INSTANCES = 1
config = moon_detector_Config()config.display()
--------------------------------------------------------------------Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 1
DETECTION_MIN_CONFIDENCE 0.95
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 14
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_mask_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'rpn_bbox_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 1
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME moon_detector_Config
NUM_CLASSES 2
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 50
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001

Load the dataset

# prepare train set
train_set = MoonDataset()
train_set.load_dataset( 'train_data_path/')
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# prepare test/val set
test_set = MoonDataset()
test_set.load_dataset('test_data_path/')
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))

Load the base model

print("Loading Mask R-CNN model...")
model = modellib.MaskRCNN(mode="training", config=config, model_dir='./')

Tweak learning_rate, epoch, and layers and Train

layers select from [‘all’, ‘3+’, ‘4+’, ‘heads’ ]

epoch in the range [ 5–50]

## train heads with higher lr to speedup the learning
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=50, layers='3+' )
history = model.keras_model.history.history

Save the model for future

model_path = ' path to your model' + 'name' + '.h5' 
model.keras_model.save_weights(model_path)

5. Test in the new set of images

# Load pretrained moon mask 

model_path = 'moon_model/moon_mask_rcnn_125.h5' # the model you created
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
#Loading the model in the inference mode
model = modellib.MaskRCNN(mode="inference", config=config, model_dir='./')
# loading the trained weights o the custom dataset
model.load_weights(model_path, by_name=True)
input_path = "test_images/"

for i in os.listdir(input_path):

img = load_img(input_path + i)
img = img_to_array(img)
# detecting objects in the image
result= model.detect([img])

# Run object detection
results = model.detect([img], verbose=1)
# Display results

r = results[0]
visualize.display_instances(img, r['rois'], r['masks'], r['class_ids'],
test_set.class_names, r['scores'],
title="Predictions")
Output. (Image by author)

Step 6: Fine-tune the model by adding more relevant mask images, of tuning the parameters in Step 4.

Step 7: moon-tracker for videos

For doing the same in the video, use frame by frame extraction of the video in Opencv and apply the model in each frame.

import cv2
import numpy as np


def random_colors(N):
np.random.seed(1)
colors = [tuple(255 * np.random.rand(3)) for _ in range(N)]
return colors


def apply_mask(image, mask, color, alpha=0.5):
"""apply mask to image"""
for n, c in enumerate(color):
image[:, :, n] = np.where(
mask == 1,
image[:, :, n] * (1 - alpha) + alpha * c,
image[:, :, n]
)
return image


def display_instances(image, boxes, masks, ids, names, scores):
"""
take the image and results and apply the mask, box, and Label
"""
n_instances = boxes.shape[0]
colors = random_colors(n_instances)

if not n_instances:
print('NO INSTANCES TO DISPLAY')
else:
assert boxes.shape[0] == masks.shape[-1] == ids.shape[0]

for i, color in enumerate(colors):
if not np.any(boxes[i]):
continue

y1, x1, y2, x2 = boxes[i]
label = names[ids[i]]
score = scores[i] if scores is not None else None
caption = '{} {:.2f}'.format(label, score) if score else label
mask = masks[:, :, i]

image = apply_mask(image, mask, color)
image = cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)
image = cv2.putText(
image, caption, (x1, y1), cv2.FONT_HERSHEY_COMPLEX, 0.7, color, 2
)

return image

Set the path and folder names.

import os
import sys

batch_size = 1

ROOT_DIR = os.getcwd()
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
VIDEO_DIR = os.path.join(ROOT_DIR, "video_moon/")
VIDEO_SAVE_DIR = os.path.join(VIDEO_DIR, "frames_object_detection/")
MODEL_PATH = os.path.join(ROOT_DIR, "moon_model/moon_mask_rcnn_125.h5")
model.load_weights(MODEL_PATH, by_name=True)
class_names = 'moon'

Get the video and analyze it frame by frame using our model.

capture = cv2.VideoCapture(os.path.join(VIDEO_DIR, 'moon_zoom.mp4'))
try:
if not os.path.exists(VIDEO_SAVE_DIR):
os.makedirs(VIDEO_SAVE_DIR)
except OSError:
print ('Error: Creating directory of data')
frames = []
frame_count = 0
# these 2 lines can be removed if you dont have a 1080p camera.
capture.set(cv2.CAP_PROP_FRAME_WIDTH, 1920)
capture.set(cv2.CAP_PROP_FRAME_HEIGHT, 1080)

while True:
ret, frame = capture.read()
# Bail out when the video file ends
if not ret:
break

# Save each frame of the video to a list
frame_count += 1
frames.append(frame)
print('frame_count :{0}'.format(frame_count))
if len(frames) == batch_size:
results = model.detect(frames, verbose=0)
print('Predicted')
for i, item in enumerate(zip(frames, results)):
frame = item[0]
r = item[1]
frame = display_instances(
frame, r['rois'], r['masks'], r['class_ids'], class_names, r['scores']
)
name = '{0}.jpg'.format(frame_count + i - batch_size)
name = os.path.join(VIDEO_SAVE_DIR, name)
cv2.imwrite(name, frame)
print('writing to file:{0}'.format(name))
# Clear the frames array to start the next batch
frames = []

capture.release()

Here is our end to end moon-tracker ready to go. Please experiment with the code and create your own version of the object trackers.

Zooming in. (Image by author)

Cheers !!!

Github link.

The whole evolution of the moon-tracker can be found in the below link.

Reference :