Optimal Regret

Optimal regret in online learning focuses on minimizing the difference between an algorithm's cumulative performance and that of an optimal strategy, particularly in scenarios with limited or delayed feedback. Current research emphasizes developing algorithms with "best-of-both-worlds" properties, performing optimally in both stochastic and adversarial environments, often employing techniques like upper confidence bounds (UCB), Follow-The-Regularized-Leader (FTRL), and posterior sampling. These advancements are significant for improving efficiency in various applications, including online advertising, recommendation systems, and reinforcement learning, by providing theoretically sound and practically efficient methods for sequential decision-making under uncertainty. The field is also actively exploring the impact of constraints, delayed feedback, and high-dimensional data on achievable regret bounds.

Papers

February 19, 2024

Refining Minimax Regret for Unsupervised Environment Design
Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster
Optimal Regret Minimax Regret Minimax Optimal Regret Unsupervised Environment Design

February 14, 2024

Second Order Methods for Bandit Optimization and Control
Arun Suggala, Y. Jennifer Sun, Praneeth Netrapalli, Elad Hazan
External Control Convex Optimization Second Order Optimal Regret Online Convex Optimization General Convex Loss

February 11, 2024

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning
Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun
Reinforcement Learning Contextual Bandit Optimal Regret Distributional Reinforcement Learning Distributional Learning

February 7, 2024

Incentivized Truthful Communication for Federated Bandits
Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang
Optimal Regret Sub Linear Regret Truthful Space Incentive Mechanism Incentive Compatibility Federated Bandit

January 17, 2024

December 15, 2023

Optimal Regret Bounds for Collaborative Learning in Bandits
Amitis Shidani, Sattar Vakili
Regret Bound Optimal Regret Regret Minimization Collaborative Learning Optimal Arm Multi Agent Multi Armed Bandit Collaborative Bandit

December 5, 2023

Projection Regret: Reducing Background Bias for Novelty Detection via Diffusion Models
Sungik Choi, Hankook Lee, Honglak Lee, Moontae Lee
Diffusion Model Optimal Regret Novelty Detection Background Bias

November 3, 2023

Distributed online constrained convex optimization with event-triggered communication
Kunpeng Zhang, Xinlei Yi, Yuzhe Li, Ming Cao, Tianyou Chai, Tao Yang
Time Varying Optimal Regret Online Convex Optimization Topology Aware Constrained Convex Optimization Event Triggered Online Primal Dual

November 2, 2023

High-dimensional Linear Bandits with Knapsacks
Wanteng Ma, Dong Xia, Jiashuo Jiang
High Dimensional Contextual Bandit Optimal Regret Sublinear Regret Replenishable Knapsack

October 28, 2023

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang
Regret Bound Optimal Regret Efficient Algorithm Heavy Tailed Reward Gaussian Reward

October 17, 2023

September 21, 2023

Incentivized Communication for Federated Bandits
Zhepei Wei, Chuanhao Li, Haifeng Xu, Hongning Wang
Optimal Regret Institutional Incentive Federated Bandit

September 2, 2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
Haolin Liu, Chen-Yu Wei, Julian Zimmert
Optimal Regret Adversarial Loss Interactive Simulation Regret Rate Adversarial Linear Contextual Bandit Optimal Dependence

July 20, 2023

Player-optimal Stable Regret for Bandit Learning in Matching Markets
Fang Kong, Shuai Li
Optimal Regret Stable Matching Stable Regret Bandit Learning Matching Market

July 14, 2023

On the Sublinear Regret of GP-UCB
Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas
Gaussian Process Optimal Regret Sublinear Regret UCB Algorithm Regret Rate

July 5, 2023

Meta-Learning Adversarial Bandit Algorithms
Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu
Multi Armed Bandit Optimal Regret Linear Bandit Bandit Feedback Meta Learner

June 30, 2023

U-Calibration: Forecasting for an Unknown Agent
Robert Kleinberg, Renato Paes Leme, Jon Schneider, Yifeng Teng
Human Prediction Agent Smith State of the Art Forecasting Optimal Regret Sublinear Regret Regret Guarantee Calibration Error

June 16, 2023

Understanding the Role of Feedback in Online Learning with Switching Costs
Duo Cheng, Xingyu Zhou, Bo Ji
Integral Role Human Feedback Online Learning Optimal Regret Bandit Feedback Hidden CoST Minimax Regret