Getting Started with Reinforcement Learning (RL) using AWS DeepRacer

Original article can be found here (source): Artificial Intelligence on Medium

Getting Started with Reinforcement Learning (RL) using AWS DeepRacer

Real life example of Reinforcement Learning

Never heard of Reinforcement Learning ? It’s a newer type of machine learning technique when compared with Supervised and Unsupervised Learning.

Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences or in simple terms learn from mistakes.And as most of us know machine learning techniques are not so easy to grasp, here’s where AWS DeepRacer comes handy, it gives you an interesting and fun way to get started with reinforcement learning (RL).

What is AWS DeepRacer?

AWS DeepRacer is the fastest way to get rolling with reinforcement learning (RL), literally, with a fully autonomous 1/18th scale race car driven by reinforcement learning, 3D racing simulator, and a global racing league. Developers can train, evaluate, and tune RL models in the online simulator, deploy their models onto AWS DeepRacer for a real-world autonomous experience and compete in the AWS DeepRacer League for a chance to win the AWS DeepRacer Championship Cup

Also,if you fascinate self drive cars you’re gonna enjoy learning with DeepRacer more

Getting Started

Visit AWS DeepRacer’s website to create your account if you don’t have one or just Sign-In into your AWS Console and navigate to DeepRacer
Perform all the necessary setup as directed in the console for initial setup

Let’s start learning

Look for Reinforcement Learning in the sidebar and choose the Get Started tab under it.Once you do, you’ll see a set of options.I recommend you to go through the basics of Reinforcement Learning which can be accessed through performing Step 1.

Creating A Model

Choose Create Model from the set of options in Step 2, once you do you’ll be redirected to Create Model Page. Choose a name for your model and choose a track to train your model(I’ll be using re:Invent 2018 track), it’s the time to choose training type(I’ll be using Time Trial) and agent(The Original DeepRacer from garage), once you select these you’ll be asked to customize reward function and training algorithm
Now a new question arises “What’s a reward function?”

Reward functions describe how the agent(car in our case) “ought” to behave. In other words, they have “normative” content, stipulating what you want the agent to accomplish.
If we train a car’s model to move on a line, we can consider the distance to the line as negative reward. Therefore, as much as the model walks exactly on the line, it would achieve maximum reward.Also if the model tends to move away from the line it will be punished.
The code editor present in the tab will give you some reward functions as an example feel free to play around with it and don’t forget to validate the function before going to the next step.(If this is your first time I recommend to see the results with the default example reward function and revisit if anything new strucks your mind)

Proximal Policy Optimization(PPO) algorithm will be used as the training algorithm for the agent as it perform comparably or better than state-of-the-art or any other existing algorithm approaches while being much simpler to implement and tune.

The DeepRacer Console allows us to tune the HyperParameters which are variables to control the reinforcement learning training. They can be tuned to optimize the training time and the model performance(Run at the default parameters if it’s your first time).It also gives an option to set Stop Condition which are intended to prevent unnecessary long-running and costly training. In the AWS DeepRacer console, you can set a maximum time in minutes that a training session can last.
Let’s now hit the create model button, it can take upto 6 minutes or more depending upon the hyperparameters and reward function supplied to the agent.

My Experience:

My best lap time was 18.24 secs.

Some points to note:

  1. People intend to write big and complex reward functions but the reality is the reward function does not need to be too complex.
  2. Do not overtrain the model, it’ll misbehave.

I’ll be glad if you share the best lap time you got using this tutorial in the comments below