Frequentist Regret

Frequentist regret analysis in reinforcement learning and related fields aims to quantify the cumulative difference between an algorithm's performance and the optimal strategy, focusing on worst-case scenarios rather than average-case behavior. Current research emphasizes developing algorithms with provably efficient frequentist regret bounds, particularly within models like multinomial logistic functions and bilinear exponential families, often incorporating techniques such as optimistic sampling and Thompson sampling with exploration strategies. These advancements are significant because they provide theoretical guarantees on the performance of learning algorithms, leading to more reliable and efficient solutions in applications ranging from multi-agent systems to resource allocation problems.

Papers

May 30, 2024

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation
Wooseong Cho, Taehyun Hwang, Joongkyu Lee, Min-hwan Oh
Reinforcement Learning Markov Decision Process Multinomial Logistic Random Exploration Frequentist Regret

December 24, 2023

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu
Agent Smith Thompson Sampling Sparse Graph Multi Agent Setting Multi Agent Multi Armed Bandit Frequentist Regret

October 5, 2022

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning
Reda Ouhamma, Debabrota Basu, Odalric-Ambrym Maillard
Task Planning Efficient Exploration State Action Space Episodic Reinforcement Learning Exponential Family Frequentist Regret

May 30, 2022

Optimistic Whittle Index Policy: Online Learning for Restless Bandits
Kai Wang, Lily Xu, Aparna Taneja, Milind Tambe
Multi Armed Bandit Online Learning Online Learning Algorithm Confidence Bound Restless Bandit Transition Probability Index Policy Frequentist Regret

February 10, 2022

Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
Junpei Komiyama
Simple Regret Optimal Algorithm Bayes Optimal Fixed Budget Best Arm Identification Frequentist Regret

Frequentist Regret

Papers

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification