C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

Source: Deep Learning on Medium

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

They are able to factorize viewpoint → this was done by adding some regularization. (3D Scene reconstruction is a hard task to do → from the image → this field is matured but the problem in hand is really hard.)

Modern software → is able to recreate from special hardware → so they are not able to be used everywhere. (the model does not have ground-truth 3D keypoints → they are able to train a model without this information → impressive!)

There are multiple models → that are doing reconstruction as well as canonicalization → quite a training procedure → interesting. (seems hard to train)

Recovering a 3D viewpoint from 2D images → is a hard problem. (since there are deformation as well as occlusion). (and there seem to be multiple solutions → but now deep learning is the state of the art) → other solutions are not scaleable as well as cannot perform that well.

Some methods are very sensitive for initialization → and their approach is related to weakly supervised training.

The very complicated system → not only we are factorizing viewpoint but also reconstruing the object → (this can handle occlusion, robust solution).

Quite a lot of vector calculus is there to formulate a problem → most of these are related to computer vision and transformation and translation of coordinates. (but these are able to perform reconstruction with motion in mind → very cool)

The general structure of the model information is given to the network → which is good. (a set of good theoretical background informations are given) → so each network acts as detangling variables of the data.

The learned 3D model seems to be very realistic → impressive that this was done without the ground truth values. (that transformation matrix seems to provide enough information for the model to be trained).

So they were able to train a model that is able to estimate the 3D keypoints → this was done by viewpoint factorization. (very impressive)

Quite surprised that this is able to successfully train. (it does not seem like there are that much information to be trained on).