Bayesian Regret
Bayesian regret quantifies the difference in cumulative reward between an optimal strategy and a learning algorithm using Bayesian methods, focusing on minimizing this regret over time. Current research emphasizes efficient exploration strategies within various frameworks, including linear bandits, reinforcement learning (RL), and contextual bandits, often employing algorithms like Thompson Sampling and posterior sampling to balance exploration and exploitation. These advancements yield improved regret bounds and enhanced performance in diverse applications such as revenue management, robotics, and online decision-making, contributing to a deeper understanding of optimal learning under uncertainty. The resulting theoretical and algorithmic improvements have significant implications for the design of efficient and robust learning systems across numerous fields.
Papers
Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling
Danil Provodin, Maurits Kaptein, Mykola Pechenizkiy
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret
Yeoneung Kim, Gihun Kim, Insoon Yang