My Personal Cheat Sheet of Deep Learning for Object Detection

Source: Deep Learning on Medium


Recently, I want to build a system to detect objects in some image data. So, I started to learning Regional based Convolutional Neural Network (RCNN) and it’s evolution, already existing CNN model, etc. For me, it is very hard to grasp because there are many new terms appear in reference readings.

So, to overcome this, I elaborated all the special terms that appear in my reference reading.

  1. CNN is a neural network that has a special layer that is the convolutional layer (and pooling layer). In the conventional CNN model, the last layer usually is a dense layer.
  2. RCNN is a special CNN and the first generation of RCNN model that using the selective search algorithm to propose some bounding boxes that covers the object that we want to detect. First, we apply the selective search algorithm to obtain some bounding boxes (region proposals). And then, put those proposed regions to the CNN model
  3. Fast-RCNN is the second generation of RCNN model. This model is faster than RCNN (Fast-RCNN > RCNN). Original input image was fed to CNN model to do a feature extraction. Afterward, we use selective search to get a regional proposal and warp them to the Region of Interest (RoI) Layer to make their matrix size fixed (proposed region have many possible shapes). After they have a fixed size, fed them to dense layer to predict it’s class and determine the bounding box.
  4. Faster-RCNN is the third generation of RCNN model. This model is faster than Fast-RCNN (Faster-RCNN > Fast-RCNN > RCNN). This model eliminates the use of slow selective search algorithms. To get region proposals, we use Regional Proposal Network (RPN).
  5. Feature Extraction is to gain output of the last layer of conventional CNN model before dense layer (or previous layer).
  6. Feature Map is the matrix resulting from feature extraction.
  7. Selective Search is a proposal algorithm that is work by over-segmenting the image based on the intensity of the pixels using a graph-based segmentation method.
  8. RoI Layer is a special pooling layer to make the proposed region size is matched with a next dense layer of the neural network.
  9. RPN is a neural network that has 2 heads. One to classifying background or foreground and the other to define bounding box area.
  10. Anchor is center of the proposed region in Faster-RCNN.
  11. Intersection over Union (IoU) is an evaluation metric of the region proposals. It’s formulated as the area of overlapped / area of the union.

I will add this list in the future if I found new terms related to object detection in Deep Learning.

References:, accessed on 3 February 2019, accessed on 3 February 2019, accessed on 3 February 2019, accessed on 3 February 2019