Bandit Feedback

Bandit feedback, where only the reward of the chosen action is observed, presents a significant challenge in online learning and optimization problems. Current research focuses on developing efficient algorithms for various settings, including constrained Markov decision processes (CMDPs), combinatorial bandits, and linear MDPs, often employing techniques like Thompson sampling, optimistic algorithms, and Frank-Wolfe methods to address the exploration-exploitation dilemma inherent in bandit feedback. These advancements are crucial for tackling real-world problems with limited feedback, such as online advertising, recommendation systems, and network optimization, where obtaining full information is impractical or costly. The development of algorithms with provable regret bounds and efficient computational complexity is a major focus, driving progress in both theoretical understanding and practical applications.

Papers

October 11, 2022

On Adaptivity in Non-stationary Stochastic Optimization With Bandit Feedback
Yining Wang
Adaptive Importance Bandit Feedback Non Stationary Dynamic Regret Bandit Convex Optimal Dynamic Regret

September 27, 2022

Doubly-Optimistic Play for Safe Linear Bandits
Tianrui Chen, Aditya Gangrade, Venkatesh Saligrama
Bandit Feedback O$ Regret Optimal Action Safe Linear Bandit

September 20, 2022

Diversified Recommendations for Agents with Adaptive Preferences
Arpit Agarwal, William Brown
Agent Smith Bandit Feedback Adversarial Bandit Clean Distribution Recommendation Diversity

September 15, 2022

Risk-aware linear bandits with convex loss
Patrick Saux, Odalric-Ambrym Maillard
Multi Armed Bandit Contextual Bandit Bandit Feedback Convex Loss Risk Aware

September 8, 2022

Sequential Information Design: Learning to Persuade in the Dark
Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovo
Bandit Feedback O$ Regret Sequential Decision Making Persuasive Argument Persuasive Capability Efficient Realization Sequential Information Design

September 6, 2022

September 3, 2022

Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
Raymond Feng, Jesse Geneson, Andrew Lee, Espen Slettnes
Reinforcement Learning Full Model Bandit Feedback Curious Price Sharp Bound Supervised Reinforcement Learning Mistake Bound

August 21, 2022

August 16, 2022

Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback
Qixin Zhang, Zengde Deng, Zaiyi Chen, Kuangqi Zhou, Haoyuan Hu, Yu Yang
Online Learning Submodular Maximization Full Information Bandit Feedback Sublinear Regret Bandit Algorithm Convex Set DR Submodular

August 13, 2022

Double Auctions with Two-sided Bandit Feedback
Soumya Basu, Abishek Sankararaman
Bandit Feedback Two Sided Double Auction

August 1, 2022

Boosted Off-Policy Learning
Ben London, Levi Lu, Ted Sandler, Thorsten Joachims
Supervised Learning Policy Learning Bandit Feedback Token Boosting Shallow Learning

July 16, 2022

Collaborative Learning in Kernel-based Bandits for Distributed Users
Sudeep Salgia, Sattar Vakili, Qing Zhao
Gaussian Process Bandit Feedback Collaborative Learning User Base Kernel Bandit Order Optimal Regret

July 7, 2022

Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback
Adhyyan Narang, Omid Sadeghi, Lillian J Ratliff, Maryam Fazel, Jeff Bilmes
Submodular Maximization Bandit Feedback Balancing Efficiency Monotone Submodular Function

June 18, 2022

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games
Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki
Nash Equilibrium Bandit Feedback Zero Sum Game Follow the Regularized Leader Two Player Zero Sum Last Iterate Convergence

June 6, 2022

Robust Pareto Set Identification with Contaminated Bandit Feedback
İlter Onat Korkmaz, Efe Eren Ceyani, Kerem Bozgan, Cem Tekin
Bandit Feedback Pareto Optimal Gaussian Reward Multi Objective Multi Armed Bandit Reward Inference Pareto Set Identification

June 4, 2022

Learning in Congestion Games with Bandit Feedback
Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du
LeArning Abstract Bandit Feedback Regret Minimization Congestion Game

June 2, 2022

Dynamic Structure Estimation from Bandit Feedback
Motoya Ohnishi, Isao Ishikawa, Yuko Kuroki, Masahiro Ikeda
Bandit Feedback Complex Dynamic Periodicity Detection Discrete Dynamical System Structure Estimation

May 27, 2022

Meta-Learning Adversarial Bandits
Maria-Florina Balcan, Keegan Harris, Mikhail Khodak, Zhiwei Steven Wu
Multi Armed Bandit Optimal Regret Linear Bandit Bandit Feedback Online Mirror Descent