The exploration vs. exploitation dilemma is one of the fundamental balances in deep reinforcement learning applications. How much resources to devote to acquire knowledge that can improve future actions versus performing specific actions? This is one of the main heuristics that rule the behavior of reinforcement learning systems. In theory, optimal exploration should always conduce to more efficient knowledge but this is far from true in the real world. Developing techniques to improve the exploration of an environment is one of the pivotal challenge of the current generation of deep reinforcement learning models. Recently, researchers from OpenAI published a research paper that proposes a very original approach to improve the exploratory capability of reinforcement learning algorithms by nothing else than introducing noise.
To understand the challenge with exploration in deep reinforcement learning systems think about researchers that spend decades in a lab without producing results with any practical application. Similarly, reinforcement learning agents can spend a disproportional amount of resources without producing a behavior that converge to a local optimum. This happens more often than you think as the exploration model is not directly correlated to the reward of the underlying process. The OpenAI team believes that the exploratory capability of deep reinforcement learning models can be directly improve by introducing random levels of noise in the parameters of the model. Does it sounds counterintuitive? Well, it shouldn’t. Consider the last time to learn a practical skill, such as a board game, by trial and error. I am sure you can recall instances in which you were challenging the conditions of the environment( such as the game rules) in order to solidify your knowledge. That’s effectively introducing noise in the input dataset J.
The OpenAI approach is not the first technique that proposes to improve exploration by introducing noise in a deep learning model. However, most of its predecessors focused on what is known as Action-Space-Noise approaches which introduce noise to change the likelihoods associated with each action the agent might take from one moment to the next. In that approach, it is very likely to obtain a different action whenever that state is sampled again in the rollout, since action space noise is completely independent of the current state. OpenAI proposes an alternative, called Parameter-Space-Noise, that introduces noises in the model policy parameters at the beginning of each episode. The Parameter-Space-Noise technique almost guarantees that the same action will be applied every time the same state in sampled from the input dataset which improves the exploratory capabilities of the model.
The Parameter-Space-Noise technique works very nicely with existing exploration models in deep reinforcement learning algorithms. Like some of its predecessors, the OpenAI researchers encountered some challenges
- Different layers of the network have different sensitivities to perturbations.
- The sensitivity of the policy’s weights may change over time while training progresses, making it hard for us to predict the actions the policy will take.
- Picking the right scale of noise is difficult because it is hard to intuitively understand how parameter noise influences a policy during training.
The research paper proposes solutions to tackle these challenges using well-known optimization techniques in the deep learning space.
The initial results of the Parameter-Space-Noise model proved to be really promising. The technique helps algorithms explore their environments more effectively, leading to higher scores and more elegant behaviors. This seems to be correlated to the fact that Parameter-Space-Noise adds noise in a deliberate manner to the parameters of the policy makes an agent’s exploration consistent across different timesteps. More importantly, the Parameter-Space-Noise technique is relatively simple to implement using the current generation of deep learning frameworks. The OpenAI team released an initial implementation as part of its reinforcement learning baselines.
What’s New in Deep Learning Research: Knowledge Exploration with Parameter Noise was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Source: Deep Learning on Medium