Paper ID: 2410.02279

On Lai's Upper Confidence Bound in Multi-Armed Bandits

Huachen Ren, Cun-Hui Zhang

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of \cite{lai1987adaptive} which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.

Submitted: Oct 3, 2024

Topics

Multi Armed Bandit
Regret Bound
Confidence Bound
Gaussian Reward

Links

arXiv PDF