Multi Armed Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative reward by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes extending MABs to handle non-stationary environments, incorporating human trust and biases, and addressing computational challenges through algorithms like Thompson Sampling and Upper Confidence Bound variations, as well as novel architectures like Bandit Networks. These advancements are driving improvements in diverse applications, including personalized recommendations, resource allocation, and financial portfolio optimization, by enabling more efficient and adaptive decision-making in complex, real-world scenarios.
Papers
SAVME: Efficient Safety Validation for Autonomous Systems Using Meta-Learning
Marc R. Schlichting, Nina V. Boord, Anthony L. Corso, Mykel J. Kochenderfer
Incentivizing Massive Unknown Workers for Budget-Limited Crowdsensing: From Off-Line and On-Line Perspectives
Feng Li, Yuqi Chai, Huan Yang, Pengfei Hu, Lingjie Duan
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits
Sepehr Assadi, Chen Wang
Getting too personal(ized): The importance of feature choice in online adaptive algorithms
ZhaoBin Li, Luna Yee, Nathaniel Sauerberg, Irene Sakson, Joseph Jay Williams, Anna N. Rafferty