the latest in aiBeta

Stable Regret

Stable regret, a measure of an algorithm's performance in online decision-making problems, focuses on minimizing the difference between an algorithm's cumulative reward and the optimal reward achievable with perfect knowledge. Current research explores various algorithms, including proximal point methods, contextual bandits enhanced by large language models, and adaptations of existing algorithms like UCT and Gale-Shapley, to achieve low stable regret in diverse settings such as multi-armed bandits, matching markets, and zero-sum games. These advancements are significant because they improve the efficiency and robustness of online learning systems across numerous applications, from recommendation systems to reinforcement learning. The development of instance-optimal algorithms and the exploration of robust methods against adversarial or delayed feedback are key areas of ongoing investigation.

15papers

Papers

December 14, 2024

p-Mean Regret for Stochastic Bandits
Stable Regret Stochastic Bandit Multi Armed Bandit

November 25, 2024

Distributed Online Optimization with Stochastic Agent Availability
Stable Regret Client Availability Regret Bound Online Optimization Tight Regret Order Optimal Regret

November 9, 2024

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games
Task Utility Balancing Efficiency Episodic Reinforcement Learning Sequential Game Stable Regret Memory Constraint Model Selection Optimal Policy

October 8, 2024

Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits
Active Exploration Stable Regret Multi Armed Bandit Semi Bandit Bandit Algorithm Piecewise Linear Exploration Characteristic

July 5, 2024

Proximal Point Method for Online Saddle Point Problem
Saddle Point Proximal Point Stable Regret Payoff Function

June 27, 2024

Jump Starting Bandits with LLM-Generated Prior Knowledge
Prior Knowledge Contextual Multi Armed Bandit Independent Jump Stable Regret Contextual Bandit

May 7, 2024

Super-Exponential Regret for UCT, AlphaGo and Variants
Natural Language Oriented Variant O$ Regret Stable Regret Lower Bound Side Chain

January 3, 2024

Improved Bandits in Many-to-one Matching Markets with Incentive Compatibility
Stable Regret Bandit Algorithm Incentive Compatibility

December 12, 2023

Online Saddle Point Problem and Online Convex-Concave Optimization
Online Convex Optimization Saddle Point Stable Regret

July 20, 2023

Player-optimal Stable Regret for Bandit Learning in Matching Markets
Optimal Regret Stable Matching Stable Regret Bandit Learning Matching Market

June 12, 2023

A Batch-to-Online Transformation under Random-Order Model
Offline Model Optimal Offline Stable Regret Online Algorithm

May 13, 2023

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
Reinforcement Learning Adversarial Markov Decision Process Policy OpTimization Regret Bound Stable Regret Delayed Bandit Regret Analysis

April 24, 2023

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory
Instance Optimal Non Asymptotic Interactive Decision Making Stable Regret

March 9, 2023

Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex Networks
Complex Network Regret Analysis Sequential Decision Efficient Protocol Stable Regret Multi Armed Bandit Absorption Coefficient Heterogeneous Bagging

May 25, 2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret
Tight Regret Human Face Stable Regret High Uncertainty Anticipation O$ Regret

February 28, 2022

Robust Multi-Agent Bandits Over Undirected Graphs
Multi Agent Multi Armed Bandit Stable Regret Undirected Graph Malicious Agent

February 14, 2022

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences
Best of Both World Algorithm Dynamic Regret Stable Regret Multi Armed Bandit Dueling Bandit Preference Feedback Online Learning

January 30, 2022

No-Regret Learning in Time-Varying Zero-Sum Games
Zero Sum Game Payoff Function Stable Regret