Bandit Learning
Bandit learning is a framework for sequential decision-making under uncertainty, aiming to optimize cumulative rewards by balancing exploration (trying different options) and exploitation (choosing the currently best option). Current research focuses on developing efficient algorithms, such as Thompson sampling and variations of upper confidence bound methods, for various bandit models, including contextual bandits, linear bandits, and those incorporating offline data or handling high-dimensional spaces. These advancements have significant implications for diverse applications like hyperparameter optimization in machine learning, personalized recommendations, and robotic control, improving efficiency and performance in these fields.
Papers
Change Detection-Based Procedures for Piecewise Stationary MABs: A Modular Approach
Yu-Han Huang, Argyrios Gerogiannis, Subhonmesh Bose, Venugopal V. Veeravalli
HPC Application Parameter Autotuning on Edge Devices: A Bandit Learning Approach
Abrar Hossain, Abdel-Hameed A. Badawy, Mohammad A. Islam, Tapasya Patki, Kishwar Ahmed