Multi Armed Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative reward by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes extending MABs to handle non-stationary environments, incorporating human trust and biases, and addressing computational challenges through algorithms like Thompson Sampling and Upper Confidence Bound variations, as well as novel architectures like Bandit Networks. These advancements are driving improvements in diverse applications, including personalized recommendations, resource allocation, and financial portfolio optimization, by enabling more efficient and adaptive decision-making in complex, real-world scenarios.
Papers
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings
Haochen Song, Ilya Musabirov, Ananya Bhattacharjee, Audrey Durand, Meredith Franklin, Anna Rafferty, Joseph Jay Williams
Multi-armed Bandit and Backbone boost Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problems
Long Wang, Jiongzhi Zheng, Zhengda Xiong, Kun He
Change Detection-Based Procedures for Piecewise Stationary MABs: A Modular Approach
Yu-Han Huang, Argyrios Gerogiannis, Subhonmesh Bose, Venugopal V. Veeravalli
HPC Application Parameter Autotuning on Edge Devices: A Bandit Learning Approach
Abrar Hossain, Abdel-Hameed A. Badawy, Mohammad A. Islam, Tapasya Patki, Kishwar Ahmed