Two Armed

Two-armed bandit problems, a core area of research in sequential decision-making, focus on efficiently identifying the arm with the highest expected reward from only two options using a limited budget of trials. Current research emphasizes developing and proving the optimality of algorithms, particularly under conditions of small reward differences between the arms (the "small-gap regime"), often employing variations of Neyman allocation and inverse probability weighting. These advancements refine our understanding of optimal strategies for resource allocation in uncertain environments, with implications for applications ranging from clinical trials to online recommendation systems.

Papers