Full Bandit

Full bandit problems, a core area of multi-armed bandit research, focus on optimizing sequential decision-making under uncertainty where complete feedback is received after each action. Current research emphasizes efficient algorithms for handling various complexities, including delayed or composite feedback, non-monotone submodular reward functions, and the challenge of distinguishing between bandit and Markov Decision Process settings. These advancements improve the theoretical understanding of regret bounds and lead to more robust and adaptable algorithms for applications ranging from online advertising to resource allocation. The development of algorithms like Maillard sampling, offering closed-form solutions and improved regret bounds, highlights the ongoing pursuit of both theoretical optimality and practical efficiency.

Papers