Source: Deep Learning on Medium
How the data flows → extract features first → then we are going to generate bounding box as well as classification.
Wow, pretty good results → this is impressive. (Regional Proposal network → is needed for end to end training of given network). (Fast RCNN → is the backbone and starting region of the MRCNN).
The feature pyramid network → was another change → and the Region of interest generation is different as well.
Some preprocessing of the images is done before as well → this step is specific for FPN.
Very specific preprocessing steps for FPN → interesting.
A lot of feature maps are created and extracted → then we are going to select regions with objects.
Two streams → one for classification and another for bounding box generation.
Wow, this is really hard to train → not just a simple end to end → a complex system of combining different operations.
And depending on if we are training/testing → some hyper-parameters are different. (also → not all feature maps are used → only 4 out of 5 feature maps are used). (Also → MRCNN → ROI align rather than pooling).
Very impressed by the fact → that all of this complex operation can be done end to end → gradients can actually flow all of the complex networks. (finally, → the bounding box are rescaled to original image size).
The ROI region difference → is to have the coordinate of the interest regions line up well.
Similiar code → but in the different dataset → and we are going to use Fast RCNN → this is not MRCNN.
Wow, 2000 bounding boxes are proposed → that is a lot of object for a given image. (Fast RCNN → only have VGG as backbone → might be a good idea to use FPN → since it takes cares of different scales).
There is two losses → classification for → knowing what classes are as well as comparing the bounding box of each image.