Multi Armed Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative reward by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes extending MABs to handle non-stationary environments, incorporating human trust and biases, and addressing computational challenges through algorithms like Thompson Sampling and Upper Confidence Bound variations, as well as novel architectures like Bandit Networks. These advancements are driving improvements in diverse applications, including personalized recommendations, resource allocation, and financial portfolio optimization, by enabling more efficient and adaptive decision-making in complex, real-world scenarios.
Papers
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features
Rowan Swiers, Subash Prabanantham, Andrew Maher
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization
Tiago Cunha, Andrea Marchini
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits
Asaf Cassel (1), Orin Levy (1), Yishay Mansour (1 and 2) ((1) School of Computer Science, Tel Aviv University, (2) Google Research, Tel Aviv)
Neural Network-Based Bandit: A Medium Access Control for the IIoT Alarm Scenario
Prasoon Raghuwanshi, Onel Luis Alcaraz López, Neelesh B. Mehta, Hirley Alves, Matti Latva-aho
Identifiable latent bandits: Combining observational data and exploration for personalized healthcare
Ahmet Zahid Balcıoğlu, Emil Carlsson, Fredrik D. Johansson