Combinatorial Semi Bandit
Combinatorial semi-bandits address the challenge of sequentially selecting subsets of items (arms) to maximize cumulative reward, receiving feedback only on the selected subset. Current research focuses on improving algorithm efficiency for large-scale problems, handling delayed or non-stationary rewards, incorporating fairness constraints, and addressing risk aversion through methods like Thompson Sampling, Upper Confidence Bound (UCB) variations, and Follow The Regularized Leader (FTRL). These advancements are significant for applications such as online advertising, crowdsourcing, and resource allocation in transportation networks, where efficient and robust decision-making under uncertainty is crucial. The field is also actively exploring the impact of causal relationships between rewards and the use of approximation oracles to handle computationally complex problems.