Bandit Problem
The multi-armed bandit problem is a sequential decision-making framework where an agent aims to maximize cumulative reward by strategically selecting actions (arms) with uncertain payoffs. Current research emphasizes efficient algorithms for various settings, including contextual bandits (using neural networks to model reward functions), batched bandits (optimizing for limited feedback), and those with non-stationary rewards or adversarial environments. These advancements are driving improvements in online recommendation systems, clinical trials, and other applications requiring adaptive learning under uncertainty, with a strong focus on minimizing regret (the difference between optimal and achieved reward).
Papers
November 1, 2024
October 18, 2024
August 8, 2024
July 24, 2024
June 6, 2024
May 9, 2024
March 6, 2024
March 5, 2024
February 11, 2024
December 21, 2023
December 12, 2023
August 15, 2023
July 27, 2023
May 26, 2023
March 16, 2023
February 14, 2023
November 29, 2022
October 14, 2022
September 7, 2022