Combinatorial Bandit

Combinatorial bandits address the challenge of sequentially selecting subsets of options (arms) to maximize cumulative reward, a problem arising in diverse fields like finance, resource allocation, and machine learning. Current research focuses on developing efficient algorithms, such as Thompson Sampling and Upper Confidence Bound variations, that handle non-stationary rewards, diverse feedback structures (e.g., semi-bandit, full-bandit), and constraints like budget limitations or diversity requirements; novel architectures like master-slave models are also being explored. These advancements improve decision-making in applications where selecting optimal combinations of actions under uncertainty is crucial, leading to better resource allocation, more effective algorithm configuration, and enhanced performance in online systems.

Papers