Source: Deep Learning on Medium
Deep Learning based Object Detection algorithms
What kinds of methods are there for object detection? → video and image analysis can be done much better.
There is still a lot of room for improvement.
The general method was to use handcrafted features → but these were before the deep learning days. (but now we can use deep learning to do object detection).
The first approach was to → use some network to extract features → and then → start to mark the regions that are interesting. (multiple bounding boxes are presented → and the classification can be done via deep learning as well).
This approach is no end to end → also complex setup.
Another approach that is now not used very much is the SPP → different number of feature maps are extracted. (this is a very good approach). (also much faster inference)
Fast RCNN → this was the first breakthrough of the paper → single training and we are going to be using a lot of complex stuff.
Region of interest pooling → basically a pooling layer → in which the region of interest is different. (hence we are able to create more dynamic pooling operation).
Fast RCNN → batch size of 2 → since we are going to generate 2000 region of interest.
The network improved to Faster RCNN → and this is very interesting → since we have upgraded to a more dynamic network. (regional moving network is now done).
Very complicated anchor boxing system → but this is the reason why it is the state of the art.
So there are two streams → one is for classification and another is for bounding box generation.
Another company → implementing RCNN → and they are explaining how they have done it.
Feature extraction → and then we are going to do pooling and operations. (this is so cool). (region proposal network → use the extracted features → and then start to create bounding boxes).
And there is no → one backbone network to use → , however, in general, it is a good idea to use deeper networks.
So each they specifically used those → layers → but using one feature map might not be the best idea. (since scaling and occlusion can happen anywhere).
Such a cute idea → rather than trying to bounding boxes directly → we are going to put some anchor points here and there → and use those points as reference points.
Quite a complex method is used → for binary cross-entropy loss → they try to balance the foreground and background regions. (this is for gradient stabilization).
There is another network → that takes in the flatten feature map → and produce a finer grain of region proposals. (and also take notes → there is one stream for classification → this is a good approach → where the dual learning can force the network to learn more dynamic features).
End to end training is much better → and there are in total of four different losses → this is a lot LOL!
And depending on the network size → we can also use regularization.
The classical model of keypoint detection → is to use → some default models → this is not robust also does not generalize well.
Hence starting from 2014 → people started to use Deep learning models for keypoint detection.
Convolutional Pose machine → poses machine → but now done with heatmap → also there is an iterative prediction. (but this was done even before the batch normalization was developed).
However, an iterative method is generally a good idea.
The next model → also have an iterative process. (this might be the good approach) → however, → can we use recurrent connection replacing the iterative approach?
There are stacked autoencoders → and after this, → we have a simple autoencoder with deconvolutional layer. (and even with L2 loss it was working fairly well!).
Other networks use optical flow → to improve pose estimation.