Armed Bandit

The multi-armed bandit problem models sequential decision-making under uncertainty, aiming to maximize cumulative rewards by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes refining algorithms like Thompson Sampling, Upper Confidence Bound (UCB), and Follow-the-Perturbed-Leader (FTPL), often incorporating advanced techniques such as distribution matching, and addressing complexities like delayed feedback, asynchronous agents, and non-linear reward structures. These advancements are crucial for optimizing applications across diverse fields, including online advertising, clinical trials, recommendation systems, and materials discovery, where efficient exploration and exploitation are paramount.

Papers

January 23, 2022

Distributed Bandits with Heterogeneous Agents
Lin Yang, Yu-zhen Janice Chen, Mohammad Hajiesmaili, John CS Lui, Don Towsley
Heterogeneous Agent Optimal Arm Multi Agent Multi Armed Bandit Armed Bandit Heterogeneous Multi Agent Order Optimal Regret Distributed Stochastic

November 25, 2021

Bandit problems with fidelity rewards
Gábor Lugosi, Ciara Pike-Burke, Pierre-André Savalle
Sublinear Regret Fidelity Reward Armed Bandit Single Strategy

November 5, 2021

Maillard Sampling: Boltzmann Exploration Done Optimally
Jie Bian, Kwang-Sung Jun
Regret Bound Armed Bandit Full Bandit

Armed Bandit

Papers

Distributed Bandits with Heterogeneous Agents

Bandit problems with fidelity rewards

Maillard Sampling: Boltzmann Exploration Done Optimally