Simple Regret

Simple regret quantifies the difference between a learning algorithm's performance and the best possible outcome in hindsight, focusing on minimizing cumulative losses over a sequence of decisions. Current research investigates simple regret across diverse applications, including reinforcement learning, online optimization, and multi-agent systems, employing algorithms like Thompson Sampling, Follow-The-Perturbed-Leader, and various policy gradient methods, often within frameworks of online convex optimization. Understanding and minimizing simple regret is crucial for improving the efficiency and robustness of learning algorithms in dynamic environments, impacting fields ranging from resource allocation to personalized recommendations.

Papers

November 14, 2022

Implications of Regret on Stability of Linear Dynamical Systems
Aren Karapetyan, Anastasios Tsiamis, Efe C. Balta, Andrea Iannelli, John Lygeros
Reinforcement Learning Core Stability Future Implication Simple Regret Linear Dynamical System Linear Policy Transition Matrix

October 30, 2022

Revisiting Simple Regret: Fast Rates for Returning a Good Arm
Yao Zhao, Connor James Stephens, Csaba Szepesvári, Kwang-Sung Jun
Multi Armed Bandit Worst Case Simple Regret Best Arm Fast Rate

October 27, 2022

Lifelong Bandit Optimization: No Prior and No Regret
Felix Schur, Parnian Kassraie, Jonas Rothfuss, Andreas Krause
Linear Bandit Sublinear Regret Bandit Algorithm Simple Regret Bandit Task

October 17, 2022

Learning Decentralized Linear Quadratic Regulator with $\sqrt{T}$ Regret
Lintao Ye, Ming Chi, Ruiquan Liao, Vijay Gupta
LeArning Abstract Online Convex Optimization Continuous Control Simple Regret Optimal Controller

September 30, 2022

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee, Sean R. Sinclair, Milind Tambe, Lily Xu, Christina Lee Yu
Bandit Algorithm Simple Regret Combinatorial Bandit Meta Algorithm

July 7, 2022

Differentially Private Stochastic Linear Bandits: (Almost) for Free
Osama A. Hanna, Antonious M. Girgis, Christina Fragouli, Suhas Diggavi
Linear Bandit Private Algorithm Simple Regret

June 17, 2022

Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control
Taylan Kargin, Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar, Babak Hassibi
Thompson Sampling Adaptive Control Simple Regret Linear Quadratic Regulator

May 27, 2022

Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
Siddharth Barman, Arindam Khan, Arnab Maiti, Ayush Sawarni
Procedural Fairness Multi Armed Bandit Optimal Regret Regret Guarantee Simple Regret Welfare Maximization

April 20, 2022

Online Caching with no Regret: Optimistic Learning via Recommendations
Naram Mhaisen, George Iosifidis, Douglas Leith
Movie Recommendation Simple Regret Sub Linear Regret Follow the Regularized Leader Service Caching Optimistic Learning Bipartite Network

March 7, 2022

February 22, 2022

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret
Tor Lattimore
Simple Regret Minimax Regret Non Cooperative Game Observable Stochastic Game Partial Monitoring

February 12, 2022

February 10, 2022

Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
Junpei Komiyama
Simple Regret Optimal Algorithm Bayes Optimal Fixed Budget Best Arm Identification Frequentist Regret

January 19, 2022

Learning to Rank For Push Notifications Using Pairwise Expected Regret
Yuguang Yue, Yuanpu Xie, Huasen Wu, Haofeng Jia, Shaodan Zhai, Wenzhe Shi, Jonathan J Hunt
LeArning Abstract Recommender System Stable Rank Ranking Model Simple Regret Ethnic Medium Pairwise Loss Mobile Notification

December 1, 2021

DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning
Archana Bura, Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland
Safe Reinforcement Learning Finite Horizon Simple Regret Optimistic Exploration Model Based Safe Reinforcement Learning Reward Aware

November 15, 2021

Optimism and Delays in Episodic Reinforcement Learning
Benjamin Howson, Ciara Pike-Burke, Sarah Filippi
Simple Regret Episodic Reinforcement Learning Significant Delay Change Optimistic Algorithm Delay Distribution