Adversarial Bandit

Adversarial bandits address the challenge of sequential decision-making under uncertainty where rewards are determined by an adversary, rather than a stochastic process. Current research focuses on improving algorithms' robustness to various challenges, including global environmental shifts, delayed feedback (especially from multiple users), unbounded losses, and switching costs, often employing techniques like Follow-the-Perturbed-Leader and variations of EXP3. These advancements aim to enhance the performance and applicability of adversarial bandit algorithms in diverse settings, such as online advertising, resource allocation, and reinforcement learning, where unpredictable or malicious influences are present. The field is also exploring fairness considerations and the impact of adversarial attacks on the algorithms themselves.

Papers