Robust Multi Armed Bandit

Robust multi-armed bandit (MAB) algorithms aim to optimize decision-making in scenarios where reward distributions are uncertain, potentially adversarial, or change over time. Current research focuses on developing algorithms resilient to various challenges, including adversarial attacks on rewards, model misspecification (e.g., non-linear reward functions or inaccurate user models), and distributional shifts, often employing techniques like Thompson sampling, regularized learning, and meta-learning frameworks. These advancements are crucial for improving the reliability and performance of MABs in real-world applications such as personalized medicine, recommendation systems, and online advertising, where robustness to unexpected events and data irregularities is paramount.

Papers