Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning
DOI:
https://doi.org/10.32473/flairs.37.1.135567Keywords:
sample efficient reinforcement learning, ensemble learning, bootstrapping, multi-head self attentionAbstract
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ and its variant DroQ, thereby enhancing Q predictions, but also effectively reduces both the average normalized bias and standard deviation of normalized bias within Q-function ensembles. Importantly, our method also performs well even in scenarios with a low update-to-data (UTD) ratio. Notably, the implementation of our proposed method is straightforward, requiring minimal modifications to the base model.
Accessibility Summary:
In accordance with Title II regulations this content meets all points of exemption as Archived web content and/or Preexisting conventional electronic documents.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.