Thompson Sampling

Thompson Sampling is a Bayesian approach to sequential decision-making problems, primarily aiming to balance exploration and exploitation efficiently. Current research focuses on extending its application beyond simple multi-armed bandits to more complex scenarios like reinforcement learning, contextual bandits (including those with noisy or partially observable contexts), and combinatorial bandits, often employing model architectures like neural networks (e.g., Graph Neural Networks) to handle high-dimensional data or non-stationary environments. These advancements improve sample efficiency and address challenges in diverse fields such as finance, robotics, and personalized medicine, offering significant improvements over classical methods in various applications.

Papers