Eagles and Algos: Reinforcement Learning
This is part 4 of a six part series. Links to the other articles in this series can be found at the bottom of this page.
One-shot imitation learning can be considered to be both an alternative and a complement to reinforcement learning. First explored in great depth by Richard Burton and Andrew Barto, reinforcement learning offers skill acquisition through trial and error, without any human intervention. Beyond the time-intensive nature of ‘trial and error’ learning, reinforcement requires the specification of a reward function to define the task’s optimal end state. Intuitively, this is more arduous than a simple demonstration of the task. However, the real promise is in combining both models. Imitation learning can be used to kick-start an agent’s learning process and then subsequently replaced by reinforcement learning. This creates an input-output asymmetry of resources involved in training an agent.
By way of analogy, remember when your golf instructor showed you how to get out of a sand trap? Next time, you might have to hit a ball out of the rough, while she is nowhere to be found. You might fail a few times before you successfully make a meaningful enough impact to dislodge the ball from the rough. Your ability to figure this out is based on your reinforcement learning model which has now been triggered by an imitation learning model. One-shot imitation learning presents an opportunity to drastically reduce the time it takes to demonstrate a task to an agent while reinforcement learning practically eliminates the need. Together, they maximize expected performances of agents on a variety of tasks.