A Gentle Introduction to Reinforcement Learning

In this article, we are going to understand about reinforcement learning!

What is Reinforcement Learning for layman?

Reinforcement Learning is an important type of Machine Learning which mainly focuses on making models learn from mistakes or an agent to help in decision-making ability to be improved through interacting with the world and evaluating feedback. Just like us humans while doing any task or playing a game we make mistakes, we learn & we adapt it. In point, that’s how we actually learn all things. Reinforcement Learning uses the same method to train the model.

Say we are put in a new Environment, at first we’ll make mistakes/fails at some points but we’ll learn from it. So that, all we know about the great story “The King and The Spider”.

A small spider was trying to weave a web across the cave ceiling and He tried to climb again and again. The moral of the story is “the spider did not give up”. But I think he find a way” “How to climb”?

Thomas Edison famous saying…

I have not failed. I’ve just found 10,000 ways that won’t work.

Eventually, he found a right path to the construction of the electric light bulb.

Just a little background:

Computers that can play games have always impressed the computing world. In December 2013 , a small group of AI researcher from a London-based company called deep mind released a ground-breaking paper called “playing Atari with Deep Reinforcement Learning” and just a little over a month later, Google announced that they had brought DeepMind for a really big sum of money. Since then, there’s been all kind of talk about reinforcement learning in the field of AI. In January of 2016, Google announced that the appropriately named AlphaGo was able to beat the reigning Go champion of the world. We’re gonna take the mystery out of reinforcement learning so you can see how all these amazing feats are possible.

More reinforcement learning:

The story of reinforcement learning goes all the way back to Al, animal psychology ,and control theory. At the heart of it, it involves an autonomous agent like a person, animal , robot, or deep net learning to navigate an uncertain environment with the goal of maximizing a numerical reward.

Sports are a great example of this. Just think of what our autonomous agent would have to deal with in tennis match. The agent would have to consider it actions like its server, returns, and volleys. These actions change the state of the game, or in other words — the current set, the leading player, things like that. And every action is performed with a reward in mind — wining a point, in order to win the game, set, and match. Our agent needs to follow a policy, or a set of rules and strategies, in order to maximize the final score.

Tennis court

But if you were building an autonomous agent, how would you actually model this? We know that the agent’s actions will change the state of the environment. So a model would need to be able to take a state and an action as input, and generate the maximum expected reward as output. But since that only gets you to the next state, you’ll need to take into account the total expected reward for ever action from the current till the end state. The way this works will be different for every application, and you’re probably not surprised to know that building a Tennis agent is different from building an Atari agent.

Deep Atari:

The researcher at DeepMind used a series of Atari screenshots to build a convolutional neural network, with a couple of tweaks. The output wasn’t a class, but instead it was a target number for the maximum reward, so it was actually dealing with regression, not classification. They also didn’t use pooling layers, since unlike image recognition, individual positions of game object, like the player, are all important and can’t be reduced. A recurrent net could have been used too, as long as the output layer was tailored for regression, and the input at each time step included the action and the environment state.

Deep Q-Net:

There’s also the deep Q-Network, or DQN for short. The DQN also uses the principle of predicting the maximum reward given a state and action. It was actually patented by Google, and it’s seen a lot of improvement like the experience replay and the dueling Network Architecture. we will discuss in next article, lets keep in touch.

Supervised learning is stateless:

Reinforcement learning isn’t just a fancy, smart- sounding way to say supervised learning. Supervised learning is all about making sense of the environment based on historical examples. But that isn’t always the best way to do things. Imagine if you’re trying to drive a car in heaving traffic based on the road patterns you observed the week before when the roads were clear. That’s about as effective as driving when you’re only looking at the rear view mirror. Reinforcement learning on the other hand is all about reward. You get points for your actions — like staying in your lane driving under the speed limit, signaling when you are supposed to, things like that. But you can also lose points for dangerous actions like tailgating and speeding. You objective is to get the maximum number of points possible given the current state of the traffic on the road around you. Reinforcement learning emphasizes that an action results in a change of state, which is something a supervised learning model doesn’t focus on.

Exploration vs Exploitation:

In April of 2016, Amazon founder Jeff Bezos talked about how his company is a great place to fail and how most companies are unwilling to suffer through “the string of failed experiment”. You can think of this as a statement about rewards. Most organizations operate in the realm of conventional wisdom, which is about exploiting what is known to achieve finite rewards with known odds.

Some groups venture into the unknown and explore new territory with prospect of out-sized rewards at long odds. And many of these organizations do fail! But some of them succeed and end up changing the world. With Reinforcement learning, an agent can explore the trade-off between exploration and exploitation, and choose the path to maximum expected reward.

Final Thoughts:

This article all about Deep Learning, so we focused on topic of building a deep reinforcement net. But Reinforcement learning falls under the broader umbrella of artificial intelligence. It involves topics like goal setting, planning, and perception. And it can even form a bridge between Al and engineering disciplines. Reinforcement learning is simple and powerful, and given the recent advances, it has the potential to become a big force in the field of deep learning. If you wanna learn more about Reinforcement learning, as references mentioning below.


  1. Deep Reinforcement Learning: Pong from Pixels
  2. Playing Atari with Deep Reinforcement Learning
  3. https://becominghuman.ai/lets-build-an-atari-ai-part-1-dqn-df57e8ff3b26

Source: Deep Learning on Medium