Optimal Bandit

Optimal bandit algorithms aim to maximize cumulative rewards in sequential decision-making problems where the outcome of each action is uncertain. Current research focuses on developing algorithms with improved regret bounds under various conditions, including volatile environments, improving bandits (where rewards increase with exploration), and extreme bandits (focused on maximizing the largest reward). These advancements leverage techniques like multiplicative updates, subsampling, and self-concordant barrier functions to achieve optimal or near-optimal performance, impacting fields like online advertising, clinical trials, and resource allocation. The development of data-dependent regret guarantees and algorithms applicable to diverse comparator classes broadens the applicability of these methods.

Papers

January 17, 2024

Adaptive Regret for Bandits Made Possible: Two Queries Suffice
Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, Elad Hazan
Multi Armed Bandit Query Information Optimal Regret Adaptive Regret Optimal Bandit

March 12, 2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback
Kaan Gokcesu, Hakan Gokcesu
Contextual Bandit Bandit Feedback Regret Guarantee Learning Problem Loss Surface Dependent Regret Optimal Bandit

August 19, 2022

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits
Vishakha Patil, Vineet Nair, Ganesh Ghalme, Arindam Khan
Multi Armed Bandit Reward Maximization Cumulative Reward Tight Guarantee Mitigating Disparity Optimal Bandit

March 21, 2022

Efficient Algorithms for Extreme Bandits
Dorian Baudry, Yoan Russac, Emilie Kaufmann
Multi Armed Bandit Quantile Regression Efficient Algorithm Non Asymptotic Bandit Problem Optimal Bandit

December 6, 2021

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan Zhou
Optimal Regret Bandit Feedback Regret Learning Monotone Game Optimal Bandit Optimal No Regret Learning

Optimal Bandit

Papers

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

Mitigating Disparity while Maximizing Reward: Tight Anytime Guarantee for Improving Bandits

Efficient Algorithms for Extreme Bandits

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback