Original article was published on Artificial Intelligence on Medium
Our journey to the top 10 in the AWS DeepRacer F1 Competition
Reinforcement learning is a cool boutique side of Artificial Intelligence, and AWS DeepRacer democratises it. Here’s a few lessons we learnt on our journey with DeepRacer.
Written by Akram Dweikat
Deep Reinforcement Learning is a bunch of techniques for training up computer agents to solve tasks with minimal help from their makers. They all work by having the agent automatically try things, see what works, and remember. When Reinforcement Learning works it is awesome — learning to fly stunt copters, beating the best of the world in mind games, making robot arms move, and — of course — playing hide-and-seek. But then most of the time it is nearly impossible to get Reinforcement Learning to work, or even find a useful application for it.
AWS DeepRacer is a competition in which the aim is to get a race car to go around a track as quickly as possible (initially in simulation). AWS provides an easy interface, simulations, and learning algorithms that let anyone get into Reinforcement Learning quickly. Beginners are tasked mostly with designing a “reward function”: this is code that specifies when you will reward your agent for doing well, similar to how you might train a dog to come to you by first giving it a treat that you show it, then giving it a treat after you call “come”, and so forth. But actually you’re not teaching a dog to come, rather you’re teaching it to walk. And the dog has only one eye, can’t smell or feel or hear, and has wheels, and is a car.
When AWS announced a month of free compute time on DeepRacer as a part of their F1-partnered racing competition, several of us at Daemon Solutions jumped on the opportunity to compete in the time-trial challenge, had a lot of fun, and (although we say so ourselves!) did quite well for newcomers:
Lesson #1: no fun if not together
Success in DeepRacer is not about winning. People who come to it with an open mind find:
- A challenging introduction to ML for all involved, and anyone with a bit of programming can get involved.
- Fun and team-building in a cooperative or competitive environment, or both. We worked with people and on topics we normally would not, and constantly learnt from each other.
- Experience with cooperation on ML projects (this can be subtly different than other software engineering projects).
But of course, the reverse is true too: cooperation is great for technical success in these frontier areas. The key to our early high ranking was exactly this sharing of ideas, strategies, and results.
Lesson #2: get under the hood
The AWS interface is kind and gentle but a bit like teaching a toddler to fly an airplane using a cardboard box held up by a parent (optionally, with “forward”, “back” and “turn” buttons in crayon on the inside). It is possible to build on this gentle introduction in various ways. Here’s what you should try first:
- There is an active and supportive community built up around DeepRacer, including a Slack channel. Join it, read, learn, and contribute.
- Set up the log analysis solution to understand exactly in which way your reward function is incentivising your car to do backflips.
- Consider making life easier for yourself by training your car to follow a particular racing line to start with.
- Understand the contents of the S3 bucket DeepRacer makes for you so that you can inspect and alter metadata for your learnt robot.
Lesson #3: driving school
The default reward function and training parameters AWS provides you with work well, but there are many switches and knobs so it pays to know some theory when you start twiddling them. In other words, a computer agent is a bit different in unexpected ways from the aforementioned toddler and dog, so you kind of need to get inside its head a bit.
- The “discount factor” determines how many steps into the past feedback is utilised to teach the agent about decisions it made previously. If your reward function gives your agent feedback very promptly for its decisions — for example, if you are asking the agent to follow a particular line around the track — then this discount factor can be lowered and your agent can learn much faster.
- A larger track means more to remember: a larger memory buffer (“number of episodes per iteration”) helps learn difficult concepts more reliably.
- Like the famous riddle, you really need to meditate on the impact of time. If your agent can accumulate reward simply by not crashing, almost any simple reward function can get it around the track.
- If you are trying to keep your agent within a well-defined area you don’t need as much exploration as the agent will learn plenty just staying within its little happy space — so you can decrease that “entropy” parameter (and increase the “learning rate” while you are at it because the thing you are asking your agent to learn is a lot easier).
- But, when not in Covid-19 induced lock-down, the aim is to compete with a real robot car: moving from the simulation to the real-world you need to ensure your agent can sufficiently generalise — this is a harder learning problem.
It’s a lesson in humility and a joyful celebration of human intelligence and creativity that enthusiasts and novices can get into Reinforcement Learning with DeepRacer and directly make an impact. We love how challenges like this can bring people with disparate backgrounds quickly into Machine Learning, working shoulder to shoulder with ML specialists on equal terms.
Find out more about ML & AI at Daemon here.