An Introduction to — Reinforcement Learning

Original article was published on Artificial Intelligence on Medium

An Introduction to — Reinforcement Learning

Photo by Alex Knight on Unsplash

“Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks.” — Stephen Hawking

Today everybody is talking about Artificial Intelligence(AI) and everyone wants to get into this field and work on some amazing ideas they have. But people are not completely aware of what’s happening and how it’s happening. So in this blog, we will try to demystify AI and give a basic introduction to Reinforcement Learning which is a category of Machine Learning.

Now, moving on to machine learning which is a subset of AI. Machine learning is the field of study that gives the computers an ability to learn without being explicitly programmed.

Moving towards Reinforcement Learning

The idea of reinforcement learning is inspired by an idea about how the human brain works. So there are a lot of analogies between them. Reinforcement learning (RL) is trial and error based method in which an agent will perform actions based on policies and value function and it will get rewards for performing an action. The reward will decide whether the action performed was a good move or a bad one. The only objective of an agent is to maximize the reward.


  • Agent — The learner which needs to make decisions and take actions.
  • Environment — It is an interactive place to an agent where it can learn.
  • Actions — The particular operation performed by an Agent.
  • Reward — The signal given by an environment to an agent for performing any particular action.
  • State — The state of an agent at a particular moment in an environment.

Elements of Reinforcement Learning

  • Policy
  • Reward Signal
  • Value Function
  • Model (optional)

Policy –

  • It defines the mapping between the state of the environment to actions to be taken whenever the agent is in those states.
  • The policy is a core of a reinforcement learning agent in the sense that it is alone sufficient to determine the behaviour of the agent.
  • It corresponds to what in psychology is termed as stimulus-response rules or association.

Reward Signal –

  • The environment will send a reward signal to an agent in the response of performing an action.
  • The agent will consider an action to be good if the reward is high else it is a bad one.
  • The reward signal is the primary basis for changing policy as, if an action chosen by policy gets low reward then the policy may get changed to select another action in future in the same situation.
  • In the human brain, we might think of the reward signal as an analogy to the experience of pleasure or pain.

Value Function –

  • The reward signal indicates what is good in an immediate sense while the value function specifies what is good in the long run.
  • The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that point.
  • The most important feature of an RL algorithm is to efficiently estimate the values, so that agent can get maximum reward.

Model –

  • It is the model of the environment.
  • It mimics the behaviour of the environment.
  • The model can help to make inferences about how an environment will behave.

Challenges of Reinforcement Learning

The most important challenge is the trade-off between Exploration and Exploitation.

  • Exploitation — The agent has to exploit its knowledge which it has gained from its past experience, in order to get maximum reward for a particular action.
  • Exploration — The agent needs to explore various actions to make better action selection in future and accumulate more reward in the long run. So, it has to try a variety of action.

The problem is that we can not do any of them excessively. So there has to be a trade-off between them.

Applications of Reinforcement Learning

One of the major application is Autonomous Cars. Many companies are focused on building them. They work on the principle of Reinforcement Learning.