Semi Bandit

Semi-bandit problems address the challenge of sequentially selecting subsets of options (arms) to maximize cumulative reward when only partial feedback is available—specifically, the rewards of the selected arms are observed, not all arms. Current research focuses on improving algorithm efficiency for large-scale problems (e.g., using sublinear time complexity algorithms) and handling complexities like non-stationary environments, causal relationships between arms, and risk constraints. These advancements are significant for applications such as online advertising, recommendation systems, and resource allocation, where efficient and robust decision-making under uncertainty is crucial.

Papers