Deep Learning based Object Detection algorithms

Source: Deep Learning on Medium

Deep Learning based Object Detection algorithms

What kinds of methods are there for object detection? → video and image analysis can be done much better.

There is still a lot of room for improvement.

The general method was to use handcrafted features → but these were before the deep learning days. (but now we can use deep learning to do object detection).

The first approach was to → use some network to extract features → and then → start to mark the regions that are interesting. (multiple bounding boxes are presented → and the classification can be done via deep learning as well).

This approach is no end to end → also complex setup.

Another approach that is now not used very much is the SPP → different number of feature maps are extracted. (this is a very good approach). (also much faster inference)

Fast RCNN → this was the first breakthrough of the paper → single training and we are going to be using a lot of complex stuff.

Region of interest pooling → basically a pooling layer → in which the region of interest is different. (hence we are able to create more dynamic pooling operation).

Fast RCNN → batch size of 2 → since we are going to generate 2000 region of interest.

The network improved to Faster RCNN → and this is very interesting → since we have upgraded to a more dynamic network. (regional moving network is now done).

Very complicated anchor boxing system → but this is the reason why it is the state of the art.

So there are two streams → one is for classification and another is for bounding box generation.

Another company → implementing RCNN → and they are explaining how they have done it.

Feature extraction → and then we are going to do pooling and operations. (this is so cool). (region proposal network → use the extracted features → and then start to create bounding boxes).

And there is no → one backbone network to use → , however, in general, it is a good idea to use deeper networks.

So each they specifically used those → layers → but using one feature map might not be the best idea. (since scaling and occlusion can happen anywhere).

Such a cute idea → rather than trying to bounding boxes directly → we are going to put some anchor points here and there → and use those points as reference points.

Quite a complex method is used → for binary cross-entropy loss → they try to balance the foreground and background regions. (this is for gradient stabilization).

There is another network → that takes in the flatten feature map → and produce a finer grain of region proposals. (and also take notes → there is one stream for classification → this is a good approach → where the dual learning can force the network to learn more dynamic features).

End to end training is much better → and there are in total of four different losses → this is a lot LOL!

And depending on the network size → we can also use regularization.

The classical model of keypoint detection → is to use → some default models → this is not robust also does not generalize well.

Hence starting from 2014 → people started to use Deep learning models for keypoint detection.

Convolutional Pose machine → poses machine → but now done with heatmap → also there is an iterative prediction. (but this was done even before the batch normalization was developed).

However, an iterative method is generally a good idea.

The next model → also have an iterative process. (this might be the good approach) → however, → can we use recurrent connection replacing the iterative approach?

There are stacked autoencoders → and after this, → we have a simple autoencoder with deconvolutional layer. (and even with L2 loss it was working fairly well!).

Other networks use optical flow → to improve pose estimation.