Thompson Sampling
Thompson Sampling is a Bayesian approach to sequential decision-making problems, primarily aiming to balance exploration and exploitation efficiently. Current research focuses on extending its application beyond simple multi-armed bandits to more complex scenarios like reinforcement learning, contextual bandits (including those with noisy or partially observable contexts), and combinatorial bandits, often employing model architectures like neural networks (e.g., Graph Neural Networks) to handle high-dimensional data or non-stationary environments. These advancements improve sample efficiency and address challenges in diverse fields such as finance, robotics, and personalized medicine, offering significant improvements over classical methods in various applications.
Papers
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings
Haochen Song, Ilya Musabirov, Ananya Bhattacharjee, Audrey Durand, Meredith Franklin, Anna Rafferty, Joseph Jay Williams
Stochastically Constrained Best Arm Identification with Thompson Sampling
Le Yang, Siyang Gao, Cheng Li, Yi Wang