COM-MABs: From Users' Feedback to Recommendation
DOI:
https://doi.org/10.32473/flairs.v35i.130560Abstract
Recently, the COMbinatorial Multi-Armed Bandits (COM-MAB) problem has arisen as an active research field. In systems interacting with humans, those reinforcement learning approaches use a feedback strategy as their reward function. On the study of those strategies, this paper present three contributions: 1) We model a feedback strategy as a three-step process, where each step influences the performances of an agent ; 2) Based on this model, we propose a novel Reward Computing process, BUSBC, which significantly increases the global accuracy reached by optimistic COM-MAB algorithms -- up to 16.2\% -- ; 3) We conduct an empirical analysis of our approach and several feedback strategies from the literature on three real-world application datasets, confirming our propositions.
Downloads
Veröffentlicht
Zitationsvorschlag
Ausgabe
Rubrik
Lizenz
Copyright (c) 2022 Alexandre Letard, Tassadit Amghar, Olivier Camp, Nicolas Gutowski
Dieses Werk steht unter der Lizenz Creative Commons Namensnennung - Nicht-kommerziell 4.0 International.