Arm Selection

Arm selection, a core problem in multi-armed bandit problems, focuses on efficiently identifying the optimal "arm" (option) from a set, maximizing cumulative reward or minimizing regret. Current research emphasizes developing algorithms robust to various challenges, including non-stationary preferences, adversarial environments, and limited precision in arm selection, often employing techniques like Thompson sampling, UCB (Upper Confidence Bound), and neural network-based approaches. These advancements have implications for diverse fields, improving efficiency in online recommendation systems, clinical trials, resource allocation, and optimization problems like shortest path routing, by enabling more adaptive and informed decision-making under uncertainty.

Papers