Multi Arm Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative rewards by strategically selecting actions (arms) with unknown payoffs. Current research focuses on improving algorithm efficiency and robustness, addressing challenges like action erasures, limited memory, and reward contamination, often employing variations of Thompson sampling, Upper Confidence Bound (UCB), and successive elimination algorithms. These advancements have implications for various fields, including online advertising, clinical trials, and resource allocation, by enabling more efficient and reliable decision-making in dynamic environments.
Papers
October 21, 2024
July 20, 2024
June 26, 2024
February 12, 2024
February 9, 2024
November 2, 2023
March 13, 2023
February 10, 2023
February 7, 2023
January 3, 2023
October 22, 2022
October 1, 2022
February 25, 2022
November 30, 2021