Source: Deep Learning on Medium
Mask R-CNN what and how does it work? Attempt 1
Instance → hard since we have to count the number of instances in the image → very hard and they can overlap.
Fast RCNN → upgrades → Mask RCNN.
Basically → the image is encoded → this is feature encoding. (another network that was trained on object classification).
FPN → we are going to extract feature in different scales → making things much more powerful. (FPN → can choose multiple feature images).
Then creates a regional bounding box → super cool. (if there is an overlap → it might be hard to decide which anchor boxes to use).
To fit the object better there are multiple suppression happening. (this is good).
Now for each bounding box → for EACH bounding box, we are going to classify → this is why it is so powerful and dynamic.
Finally, → we are going to generate a segmentation mask. (the ground truth mask is scaled down to 256*256) → for easier training and faster.
The author used different kinds of annotation tools.
Feature extraction network is the backbone → without this it is nothing.
The only difference between Faster RCNN → is the mask network → to generate a segmentation mask. (and there is some other refinement stage as well).
Non maxima suppresion → since occlulsion is a huge problem.
Separate different objects and for each object → give some identity. (there are 7 objects in this image).
Quite a small feature map → but we are going to build a pyramid.
The start of anchors → are predefined locations before training → this is a much easier approach.
Computer vision is a cool field → we are asking the network to draw a bounding box in the image → this is very important and has a lot of applications.
The softmax → is going to propose multiple regions of the image → that will contain some kind of object.
Some of the networks use → FCN → to get the segmentation map or more.
Basically, if the backbone of feature extraction is done well → consider the problem is solved.
They just combined two different states of the art models. (coloring is pretty easy → just showing with different color maps).