Mask R-CNN what and how does it work? Attempt 2

Source: Deep Learning on Medium

How the data flows → extract features first → then we are going to generate bounding box as well as classification.

Wow, pretty good results → this is impressive. (Regional Proposal network → is needed for end to end training of given network). (Fast RCNN → is the backbone and starting region of the MRCNN).

The feature pyramid network → was another change → and the Region of interest generation is different as well.

Some preprocessing of the images is done before as well → this step is specific for FPN.

Very specific preprocessing steps for FPN → interesting.

A lot of feature maps are created and extracted → then we are going to select regions with objects.

Two streams → one for classification and another for bounding box generation.

Wow, this is really hard to train → not just a simple end to end → a complex system of combining different operations.

And depending on if we are training/testing → some hyper-parameters are different. (also → not all feature maps are used → only 4 out of 5 feature maps are used). (Also → MRCNN → ROI align rather than pooling).

Very impressed by the fact → that all of this complex operation can be done end to end → gradients can actually flow all of the complex networks. (finally, → the bounding box are rescaled to original image size).

The ROI region difference → is to have the coordinate of the interest regions line up well.

Similiar code → but in the different dataset → and we are going to use Fast RCNN → this is not MRCNN.

Wow, 2000 bounding boxes are proposed → that is a lot of object for a given image. (Fast RCNN → only have VGG as backbone → might be a good idea to use FPN → since it takes cares of different scales).

There is two losses → classification for → knowing what classes are as well as comparing the bounding box of each image.