COM-MABs: From Users' Feedback to Recommendation
DOI:
https://doi.org/10.32473/flairs.v35i.130560Resumo
Recently, the COMbinatorial Multi-Armed Bandits (COM-MAB) problem has arisen as an active research field. In systems interacting with humans, those reinforcement learning approaches use a feedback strategy as their reward function. On the study of those strategies, this paper present three contributions: 1) We model a feedback strategy as a three-step process, where each step influences the performances of an agent ; 2) Based on this model, we propose a novel Reward Computing process, BUSBC, which significantly increases the global accuracy reached by optimistic COM-MAB algorithms -- up to 16.2\% -- ; 3) We conduct an empirical analysis of our approach and several feedback strategies from the literature on three real-world application datasets, confirming our propositions.
Downloads
Publicado
Como Citar
Edição
Seção
Licença
Copyright (c) 2022 Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
Este trabalho está licenciado sob uma licença Creative Commons Attribution-NonCommercial 4.0 International License.