Reinforcement Learning

Original article was published on Artificial Intelligence on Medium

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behaviour or path it should take in a specific situation. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training dataset, it is bound to learn from its experience.

Types of Reinforcements:

There are two kinds of reinforcement learning methods:


It is defined as an event, that occurs because of specific behaviour. It increases the strength and the frequency of the behaviour and impacts positively on the action taken by the agent.

This type of Reinforcement helps you to maximize performance and sustain change for a more extended period. However, too much Reinforcement may lead to over-optimization of state, which can affect the results.


Negative Reinforcement is defined as strengthening of behaviour that occurs because of a negative condition which should have stopped or avoided. It helps you to define the minimum stand of performance. However, the drawback of this method is that it provides enough to meet up the minimum behaviour.

Types of Learning Models of Reinforcement

There are two important learning models in reinforcement learning:

Markov Decision Process:

In Reinforcement learning, an agent is supposed to decide the best action to select based on his current state. When this step is repeated, the problem is known as a Markov Decision Process. The following parameters are used to get a solution:

-Set of actions- A

-Set of states -S

-Reward- R

-Policy- n

-Value- V

Q learning:

Q learning is a value-based method of supplying information to inform which action an agent should take. In Q learning the paths which lead directly to the goal have the highest reward and the paths which do not directly connect to the goal have the lowest rewards. Depending on these reward values an action is taken accordingly.