Regret Bound
Regret bound analysis focuses on quantifying the performance of online learning algorithms, particularly in scenarios like multi-armed bandits and reinforcement learning, by measuring the difference between an algorithm's cumulative reward and that of an optimal strategy. Current research emphasizes developing algorithms with tighter regret bounds, often employing techniques like optimism in the face of uncertainty, Thompson sampling, and advanced exploration strategies tailored to specific problem structures (e.g., linear models, contextual bandits). These improvements have significant implications for various applications, including personalized recommendations, online advertising, and resource allocation, by enabling more efficient and effective decision-making under uncertainty.
Papers
Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal
Juliusz Ziomek, Masaki Adachi, Michael A. Osborne
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
Kihyun Yu, Duksang Lee, William Overman, Dabeen Lee
Queueing Matching Bandits with Preference Feedback
Jung-hun Kim, Min-hwan Oh