Multi Arm Bandit

Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative rewards by strategically selecting actions (arms) with unknown payoffs. Current research focuses on improving algorithm efficiency and robustness, addressing challenges like action erasures, limited memory, and reward contamination, often employing variations of Thompson sampling, Upper Confidence Bound (UCB), and successive elimination algorithms. These advancements have implications for various fields, including online advertising, clinical trials, and resource allocation, by enabling more efficient and reliable decision-making in dynamic environments.

Papers