Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning

Source: Deep Learning on Medium

Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning

Let’s find similar key points in similar objects.

So the are same in the angle and more. (annotating is very hard to do).

There has been work on autoencoders to extract those keypoints automatically.

They directly optimize the problem → super cool.

Very good and powerful → they have some kind of keypoint net.

Back prop via → SVD → this is critical. (the analytical solution is done). (super cool tracking).

Find the good key points → this can be used in the downstream task → but also we can directly optimize for this objective.

Multiple loss functions → are generally good for performance-boosting → the annotation of humans are wrong and they are not consistent. (and a lot of tasks are done after keypoint detection → for a downstream task).

What they do is → auto key-point detection. (and optimize directly).

Some of the discovered parts → are very good. (and they achieve better results).

Computer vision application → a lot of security application is there. (this is good). (multiple researchers → uses a different kind of losses → to make these work → so loss function engineering is the key idea.)

Or uses different types of data → such as ego-motion and more. (learning a good structure representation is the goal of the research).

This is how it is done → via → transformation → and then key point loss functions. (they have a lot of application).

They do not have any annotation → rather → just the keypoint detection and optimization of those points. (super interesting).

So this is a traditional computer vision problem → where those transformations are given → and since we know that transformation. (we can use them in the loss function).

The network outputs key points → and our goal is to use that keypoint for unsupervised learning. (learning interesting key points).

And they use CNN pose machine to approach → a softmax probability map! (super good idea). (learning global relation is critical → we can use methods such as dominate direction → these are very interesting computer vision problems) → so cool.

There should not be an overlap between points and the object should be inside the object. (useful features of the extracted keypoints). (they used sharp-net dataset → wonder how this will translate to the real world).

Adam with 200k training epoch → so it needs a lot of time for optimization. (they created their own test data → for a fair comparison). (interestingly → the unsupervised model did better than supervised).

Even able to generalize to new objects that the model never is seen before. (this work can be improved via → domain adaptation) → and it would be really cool to see this method applied in the real world images.