Bandit Feedback

Bandit feedback, where only the reward of the chosen action is observed, presents a significant challenge in online learning and optimization problems. Current research focuses on developing efficient algorithms for various settings, including constrained Markov decision processes (CMDPs), combinatorial bandits, and linear MDPs, often employing techniques like Thompson sampling, optimistic algorithms, and Frank-Wolfe methods to address the exploration-exploitation dilemma inherent in bandit feedback. These advancements are crucial for tackling real-world problems with limited feedback, such as online advertising, recommendation systems, and network optimization, where obtaining full information is impractical or costly. The development of algorithms with provable regret bounds and efficient computational complexity is a major focus, driving progress in both theoretical understanding and practical applications.

Papers

May 1, 2023

April 11, 2023

BanditQ: Fair Bandits with Guaranteed Rewards
Abhishek Sinha
Multi Armed Bandit Optimal Regret Bandit Feedback Reward Report Adversarial Bandit Bandit Policy

March 12, 2023

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback
Kaan Gokcesu, Hakan Gokcesu
Contextual Bandit Bandit Feedback Regret Guarantee Learning Problem Loss Surface Dependent Regret Optimal Bandit

March 9, 2023

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback
Junfan Li, Shizhong Liao
Regret Bound Bandit Feedback Lipschitz Loss Online Regression Online Kernel

March 5, 2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback
Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
Bandit Feedback Markov Game Last Iterate Convergence Zero Sum Markov Game Matrix Game

February 23, 2023

Sequential Counterfactual Risk Minimization
Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard
Bandit Feedback Counterfactual Estimation Regret Rate

February 18, 2023

Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback
Riccardo Della Vecchia, Debabrota Basu
Linear Bandit Bandit Feedback Instrumental Variable Online Stochastic Exogenous Variable Global Endogenous Variate

February 8, 2023

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands
Lixing Lyu, Wang Chi Cheung
Bandit Feedback Online Resource Allocation Advice Quality Time Varying Demand

February 2, 2023

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback
Fares Fourati, Vaneet Aggarwal, Christopher John Quinn, Mohamed-Slim Alouini
Submodular Maximization Bandit Feedback Greedy Learning Combinatorial Multi Armed Bandit Full Bandit

January 31, 2023

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback
Wonyoung Kim, Garud Iyengar, Assaf Zeevi
Contextual Bandit Bandit Feedback Improved Algorithm Consumption Data Bandit Policy

January 30, 2023

January 26, 2023

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback
Yeongjong Kim, Dabeen Lee
Bandit Feedback Online Convex Optimization Constraint Violation Stochastic Constraint Drift Plus Penalty

January 22, 2023

Doubly Adversarial Federated Bandits
Jialin Yi, Milan Vojnović
Multi Armed Bandit Bandit Feedback Bandit Algorithm Sub Linear Regret Oblivious Adversary

December 9, 2022

Multi-Task Off-Policy Learning from Bandit Feedback
Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
Policy Learning Bandit Feedback Policy Optimization Recommendation Policy

November 28, 2022

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation
Arnob Ghosh
High Efficiency Bandit Feedback Model Free Markov Game Single Agent Linear Function Approximation Linear MDPs

October 12, 2022

BORA: Bayesian Optimization for Resource Allocation
Antonio Candelieri, Andrea Ponti, Francesco Archetti
Bayesian Optimization Resource Allocation Bandit Feedback Allocation Mechanism

October 11, 2022

Trading Off Resource Budgets for Improved Regret Bounds
Damon Falck, Thomas Orton
Regret Bound Submodular Maximization Bandit Feedback Counterfactual Regret Regret Minimizer

Bandit Feedback

Papers

Performative Prediction with Bandit Feedback: Learning through Reparameterization

The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

BanditQ: Fair Bandits with Guaranteed Rewards

Data Dependent Regret Guarantees Against General Comparators for Full or Bandit Feedback

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games with Bandit Feedback

Sequential Counterfactual Risk Minimization

Stochastic Online Instrumental Variable Regression: Regrets for Endogeneity and Bandit Feedback

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback

Doubly Adversarial Federated Bandits

Multi-Task Off-Policy Learning from Bandit Feedback

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

BORA: Bayesian Optimization for Resource Allocation

Trading Off Resource Budgets for Improved Regret Bounds