Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus BASE kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Title: Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits
Authors: Besson, Lilian; Kaufmann, Emilie; Maillard, Odalric-Ambrym; Seznec, Julien
Contributors: École normale supérieure - Rennes (ENS Rennes); Scool (Scool); Centre Inria de l'Université de Lille; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL); Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS); Lelivrescolaire.fr; ANR-16-CE40-0002,BADASS,BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES(2016); ANR-19-CE23-0026,BOLD,Au delà de l'apprentissage séquentiel pour de meilleures prises de décisions(2019)
Source: ISSN: 1532-4435.
Publisher Information: CCSD; Microtome Publishing
Publication Year: 2022
Collection: LillOA (HAL Lille Open Archive, Université de Lille)
Subject Terms: Change Point Detection; Non-Stationary Bandits; Multi-Armed Bandits; [STAT.OT]Statistics [stat]/Other Statistics [stat.ML]
Description: International audience ; We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA \Upsilon_T\log(T)})$ regret in $T$ rounds on some ``easy'' instances, where A is the number of arms and $\Upsilon_T$ the number of change-points, without prior knowledge of $\Upsilon_T$. In contrast with recently proposed algorithms that are agnostic to $\Upsilon_T$, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.
Document Type: article in journal/newspaper
Language: English
Relation: info:eu-repo/semantics/altIdentifier/arxiv/1902.01575; ARXIV: 1902.01575
Availability: https://inria.hal.science/hal-02006471; https://inria.hal.science/hal-02006471v3/document; https://inria.hal.science/hal-02006471v3/file/BKMS22%20%281%29.pdf
Rights: http://creativecommons.org/licenses/by-nc-sa/ ; info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.ABD41
Database: BASE