Eagles and Algos: Data and Dollars
This is part 5 of a six part series. Links to the other articles in this series can be found at the bottom of this page.
We have seen how agents can be trained to perform novel tasks by observing and imitating a single or handful of demonstrations as well as by trial and error. However, regardless of how small these requisite training samples are, they still add up to a substantive amount of data. This presents both time and capital-intensive hurdles to adequately train agents. This issue is further compounded by the paucity of robotics-specific labeled data suitable for training agents. For example, in order to train the Fetch robotic arm to stack blocks, the research team would have had to set up hundreds of different configurations of those blocks in the physical world, for observation by the agent. This takes up a chunk of time and is far from ideal.
What if there was a way for an agent to learn in a simulated environment and execute the model in the physical realm? Just like Neo learning martial arts in a simulator to fight Morpheus in the Matrix.
That is exactly what domain randomization does. It is a technique for agents to learn a policy, in a virtual reality simulation, that seamlessly transfers to the real world. In robotics, policies can be broken down into three primary types:
1. Perception: ability to see its environment
2. State (pose) estimation: ability to accurately locate objects relative to each other
3. Control: ability to pick, grasp and drop items
Impressive research has been done to better understand how to simulate each one of these policies and transfer the learnings to the real world. Some amazing work has come out of Berkeley and Open AI, focusing specifically on perception, state and control
It is worth mentioning the high degree of difficulty in tuning the parameters of a simulated environment to accurately represent the real world. It is a time-intensive exercise that is also susceptible to errors. And there are some physical effects, such as fluid dynamics, that still cannot be modeled in existing simulators. This discrepancy between a simulator and the physical realm is called a reality gap and it presents a non-trivial hurdle to using simulated data to teach robots that have to operate in the real world. Domain randomization bridges this reality gap by exposing the agent, during training, to a wide spectrum of variations of the simulated environment. Given a high enough number of variations, the real world appears to the agent as just another instance of the simulator. When combined with the previously discussed learning models, this offers the possibility of building teachable robots that can be trained by anybody, regardless of technical competence. All you need is a VR headset with a few demonstrations and your newly trained robot is on its way.
Putting It All Together
As the field of machine learning and its applications grow exponentially, there is increasing need for a cohesive approach for training agents. The models and technique discussed in this article should be seen as key joints that make up a lattice framework for optimal learning, all connected by a need for data efficiency. It is equally important to understand their core differences to get a better appreciation for how they might complement each other. The table below offers a snapshot of these differences:
Finally, we must acknowledge that despite all the progress made in machine learning over the past twenty years, there is still work done to be done. Nowhere is this more evident than in environments that possess a high degree of variability. Under such circumstances, machine learning tends to break down. Additionally, while machine agents are good at observation and imitation (replicating the what), they still lag in intuition (understanding the why) behind actions. This presents one of the next frontiers in machine learning. Max Kanter and Kalyan Veeramachaneni have done fantastic work on machine intuition if you care to read more.