Combinatorial Multi Armed Bandit

Combinatorial multi-armed bandits (CMABs) address the challenge of sequentially selecting subsets of arms (actions) to maximize cumulative reward, where each subset's reward is jointly determined. Current research focuses on developing efficient algorithms, such as UCB and Thompson Sampling variants, tailored to various feedback structures (full-bandit, semi-bandit, max-value index) and reward functions (linear, submodular, non-monotone), often incorporating constraints like budgets or costs. This field is significant because CMABs model numerous real-world problems, including recommendation systems, resource allocation, and crowdsourcing, offering a powerful framework for optimizing sequential decision-making under uncertainty.

Papers