COM-MABs: From Users' Feedback to Recommendation
DOI:
https://doi.org/10.32473/flairs.v35i.130560Resumen
Recently, the COMbinatorial Multi-Armed Bandits (COM-MAB) problem has arisen as an active research field. In systems interacting with humans, those reinforcement learning approaches use a feedback strategy as their reward function. On the study of those strategies, this paper present three contributions: 1) We model a feedback strategy as a three-step process, where each step influences the performances of an agent ; 2) Based on this model, we propose a novel Reward Computing process, BUSBC, which significantly increases the global accuracy reached by optimistic COM-MAB algorithms -- up to 16.2\% -- ; 3) We conduct an empirical analysis of our approach and several feedback strategies from the literature on three real-world application datasets, confirming our propositions.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2022 Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.