Biasing Exploration towards Positive Error for Efficient Reinforcement Learning

Authors

DOI:

https://doi.org/10.32473/flairs.38.1.138835

Keywords:

Reinforcement Learning, Deep Reinforcement Learning, Bandits

Abstract

Efficient exploration remains a critical challenge in Reinforcement Learning (RL), significantly affecting sample efficiency. This paper demonstrates that biasing exploration towards state-action pairs with positive temporal difference error speeds up convergence and, in some challenging environments, has the potential to result in an improved policy. We show that this Positive Error Bias (PEB) method achieves statistically significant performance improvements across various tasks and estimators. Empirical results demonstrate PEB’s effectiveness in bandits, grid worlds, and classic control tasks with exact and approximate estimators. PEB is particularly effective when unbiased exploration struggles with policy discovery.

Downloads

Published

14-05-2025

How to Cite

Parker, A., & Sheppard, J. (2025). Biasing Exploration towards Positive Error for Efficient Reinforcement Learning. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.138835