Structured Bandit

Structured bandits address the challenge of sequential decision-making where the options (arms) are interconnected, exhibiting underlying structure. Research focuses on developing algorithms that efficiently balance exploration and exploitation, minimizing regret while leveraging this structure, often employing techniques like Thompson sampling, decision transformers, and saddle-point optimization within Bayesian or frequentist frameworks. These advancements improve the efficiency and robustness of decision-making in various applications, including personalized recommendations, resource allocation, and online advertising, by enabling better generalization and adaptation to unseen scenarios. The field is actively exploring tighter theoretical bounds on performance and developing scalable algorithms for high-dimensional problems.

Papers