Original article can be found here (source): Artificial Intelligence on Medium
What is RL and where does it fit?
Now that you have a little context on the history and significance of reinforcement learning, let’s dive deeper into how RL algorithms actually work. RL is a pseudo supervised machine learning method. Think of something in between clustering (completely unsupervised — there are no labels and the algorithm learns the relationships in the data on its own) and classification (completely supervised — the algorithm is given data and labels). On the other side of the spectrum there are rules based engines, like Deep Blue, that don’t involve any learning.
The goal of the RL algorithm is to maximize future rewards. The algorithm doesn’t know what actions to take initially, but in the process of trying to maximize rewards, the algorithm learns what actions lead to rewards. This process can still be distilled to the simple concept of trial and error learning. When the RL algorithm does something good, it is rewarded. When the algorithm does something bad, it is penalized. In this way, the RL algorithm is guided in its learning process. It is important to note that the RL algorithm takes actions on its own and moves about the environment randomly as it learns to maximize rewards. A helpful analogy for the different types of machine learning is training a dog.
Rules engines are traditional computer science, similar to hard programming instructions into a robot dog. The robot dog is not truly learning any tricks; it is simply executing instructions. Deep Blue is a great example of a rules engine.
Supervised learning is like teaching a dog how to do something by example. Teaching by example is similar to providing an algorithm with labeled data to learn from.
Unsupervised learning is like a dog learning from other dogs. The dog has no direct supervision but it learns from other dogs, aka data points, around it.
Reinforcement learning is similar to teaching a dog how to behave by rewarding it with treats. The dog may not know how to behave initially, but it learns to associate certain actions with rewards in the form of treats.
The terminology and process of RL is quite different from other types of machine learning. Instead of features and labels, you have an agent and an environment. Sounds complicated? Below are the major key vocabulary terms for RL.
- Agent: Think of the agent as the algorithm. It takes actions in the environment.
- Environment: This is where the agent exists, operates, and takes actions.
- State: The situation the agent is in at a given moment in time. It changes based on the actions the agent takes.
- Actions: The agent can take various actions to interact with the environment.
- Rewards: Rewards are feedback that the agent receives from the environment after taking an action. Rewards can be positive or negative.
In summary, an agent takes an action that changes the state of the environment and obtains rewards from the environment. The agent then learns from the rewards it receives from the environment.