Order Optimal Regret

Order-optimal regret in reinforcement learning (RL) and related bandit problems focuses on designing algorithms that minimize the cumulative difference between rewards obtained and the optimal rewards achievable over time. Current research emphasizes achieving this optimality across various settings, including distributed systems, delayed feedback, and function approximation using kernel methods and linear models, often employing algorithms like Thompson sampling, Upper Confidence Bound (UCB) variations, and Follow-the-Perturbed-Leader. This pursuit of order-optimal regret is crucial for developing efficient and reliable RL agents in real-world applications where minimizing cumulative losses is paramount, particularly in scenarios with limited communication or delayed information. The theoretical advancements in this area directly translate to improved performance and resource efficiency in diverse fields like personalized medicine, robotics, and online advertising.

Papers

November 25, 2024

Distributed Online Optimization with Stochastic Agent Availability
Juliette Achddou, Nicolò Cesa-Bianchi, Hao Qiu
Regret Bound Online Optimization Client Availability Order Optimal Regret Stable Regret Tight Regret

June 21, 2024

Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning
Sattar Vakili
Reinforcement Learning Deep Learning Open Problem Nonlinear Function Approximation Order Optimal Regret Kernel Based Reinforcement Learning

March 16, 2024

The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting
Ziping Xu, Kelly W. Zhang, Susan A. Murphy
Reinforcement Learning Robot Learning Regret Minimization Cumulative Regret Order Optimal Regret Regret Rate Sequential Task

February 20, 2024

October 29, 2023

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma
Reinforcement Learning High Dimensional Posterior Sampling Linear Function Approximation Delayed Feedback Order Optimal Regret Stochastic Delay

October 23, 2023

Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
Sudeep Salgia, Sattar Vakili, Qing Zhao
Bayesian Optimization Computational Efficiency Kernel Bandit Order Optimal Regret Random Exploration

June 15, 2023

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning
Amin Karbasi, Nikki Lijing Kuang, Yi-An Ma, Siddharth Mitra
Reinforcement Learning Thompson Sampling Sequential Decision Making Infinite Horizon Stochastic Multi Armed Bandit Order Optimal Regret

June 13, 2023

Kernelized Reinforcement Learning with Order Optimal Regret Bounds
Sattar Vakili, Julia Olkhovskaya
Reinforcement Learning Action Space Regret Bound Sublinear Regret Regret Guarantee Order Optimal Regret Kernel Based Reinforcement Learning

January 21, 2023

A Communication-Efficient Adaptive Algorithm for Federated Learning under Cumulative Regret
Sudeep Salgia, Qing Zhao, Tamir Gabay, Kobi Cohen
Simple Regret Online Learning Algorithm Communication Cost Order Optimal Regret Online Stochastic

July 16, 2022

Collaborative Learning in Kernel-based Bandits for Distributed Users
Sudeep Salgia, Sattar Vakili, Qing Zhao
Gaussian Process Bandit Feedback Collaborative Learning User Base Kernel Bandit Order Optimal Regret

May 26, 2022

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
Yan Dai, Haipeng Luo, Liyu Chen
Bandit Feedback Regret Minimization Follow the Regularized Leader Adversarial Markov Decision Process Order Optimal Regret

March 23, 2022

Minimax Regret for Cascading Bandits
Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant
Minimax Regret Order Optimal Regret Variance Aware Linear Reward

March 2, 2022

Linear Stochastic Bandits over a Bit-Constrained Channel
Aritra Mitra, Hamed Hassani, George J. Pappas
Regret Bound Linear Bandit Sequential Decision Order Optimal Regret Constrained Channel

February 9, 2022

Online Learning to Transport via the Minimal Selection Principle
Wenxuan Guo, YoonHaeng Hur, Tengyuan Liang, Christopher Ryan
Optimal Transport Online Learning Resource Allocation Transportation System Wasserstein Gradient Flow Order Optimal Regret Minimal Selection Principle

January 31, 2022

Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework
Ziyi Huang, Henry Lam, Amirhossein Meisami, Haofeng Zhang
Multi Armed Bandit Confidence Bound Stochastic Bandit Approximate Inference Bandit Problem Order Optimal Regret

January 23, 2022

Distributed Bandits with Heterogeneous Agents
Lin Yang, Yu-zhen Janice Chen, Mohammad Hajiesmaili, John CS Lui, Don Towsley
Heterogeneous Agent Optimal Arm Multi Agent Multi Armed Bandit Armed Bandit Heterogeneous Multi Agent Order Optimal Regret Distributed Stochastic

Order Optimal Regret

Papers

Distributed Online Optimization with Stochastic Agent Availability

Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning

The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Order-Optimal Regret in Distributed Kernel Bandits using Uniform Sampling with Shared Randomness

Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

A Communication-Efficient Adaptive Algorithm for Federated Learning under Cumulative Regret

Collaborative Learning in Kernel-based Bandits for Distributed Users

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

Minimax Regret for Cascading Bandits

Linear Stochastic Bandits over a Bit-Constrained Channel

Online Learning to Transport via the Minimal Selection Principle

Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework

Distributed Bandits with Heterogeneous Agents