Simple Regret
Simple regret quantifies the difference between a learning algorithm's performance and the best possible outcome in hindsight, focusing on minimizing cumulative losses over a sequence of decisions. Current research investigates simple regret across diverse applications, including reinforcement learning, online optimization, and multi-agent systems, employing algorithms like Thompson Sampling, Follow-The-Perturbed-Leader, and various policy gradient methods, often within frameworks of online convex optimization. Understanding and minimizing simple regret is crucial for improving the efficiency and robustness of learning algorithms in dynamic environments, impacting fields ranging from resource allocation to personalized recommendations.
Papers
Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets
Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm
Debabrota Basu, Odalric-Ambrym Maillard, Timothée Mathieu