Source: Deep Learning on Medium
Unsupervised Learning of Object Keypoints for Perception and Control
Many image research have been done for classification → however their aim is to learn a representation for RL and control. (long term tracking is critical and learning good representation for RL agent can make a lot of applications).
Knowing a good representation → to use it in different settings is good. Since we can build up knowledge.
This CNN → is specifically for keypoint detection for control. (very interesting use case). (their method learns the more accurate keypoints then other methods → this is an important building block for RL task) → however, very specific for RL use cases. (robust to size and object) → good for a lot of computer vision tasks.
Their method learns more spatial keypoints. (there have been multiple works on unsupervised object keypoint detections → the basic of them use autoencoders) → this method → is a bit different where → they do not need a specific form of data.
Did not know that RL used a lot of object keypoint detection.
The problem is formulated in a way → that is critical for the RL application. (robotic controls). (their method is not a generative model) → and good for long term tracking.
But the problem with RL → the training has to be data efficient. (they use the Q learning framework).
Even when there is difference in time → the keypoint tracking is very reliable. (this is very good). (so this paper is a combination of deep learning and RL → quite complicated architecture since they also have a reply buffer to work on).
The author’s method → is very good for long term tracking. (this is important in real life RL). Now RL is going deeper than just playing games → tracking an object in real-world from scratch can be a great use case.
CNN → is used for encoding the image features → and RNN is used to perform control. (super interesting → look very hard to optimize).
The image size was set to 80*80 pixels which are pretty small. (the author’s method → really does well). (the model need 400,000 frames → that might be a lot of data) → yet this is a good amount of data for RL.
The random options policy is very interesting. (the agent learns without reward).
They can even learn without any reward → that is very powerful. (ego-motion can be used as well).