Bandit Problem
The multi-armed bandit problem is a sequential decision-making framework where an agent aims to maximize cumulative reward by strategically selecting actions (arms) with uncertain payoffs. Current research emphasizes efficient algorithms for various settings, including contextual bandits (using neural networks to model reward functions), batched bandits (optimizing for limited feedback), and those with non-stationary rewards or adversarial environments. These advancements are driving improvements in online recommendation systems, clinical trials, and other applications requiring adaptive learning under uncertainty, with a strong focus on minimizing regret (the difference between optimal and achieved reward).
Papers
July 22, 2022
July 13, 2022
July 12, 2022
June 28, 2022
June 7, 2022
April 24, 2022
March 21, 2022
March 15, 2022
February 25, 2022
February 14, 2022
January 31, 2022
January 25, 2022
January 17, 2022