Paper ID: 2105.09232

Diffusion Approximations for Thompson Sampling

Lin Fan, Peter W. Glynn

We study the behavior of Thompson sampling from the perspective of weak convergence. In the regime where the gaps between arm means scale as $1/\sqrt{n}$ with the time horizon $n$, we show that the dynamics of Thompson sampling evolve according to discrete versions of SDE's and stochastic ODE's. As $n \to \infty$, we show that the dynamics converge weakly to solutions of the corresponding SDE's and stochastic ODE's. Our weak convergence theory is developed from first principles using the Continuous Mapping Theorem, and can be easily adapted to analyze other sampling-based bandit algorithms. In this regime, we also show that the weak limits of the dynamics of many sampling-based algorithms -- including Thompson sampling designed for any exponential family of rewards, and algorithms involving bootstrap-based sampling -- coincide with those of Gaussian Thompson sampling. Moreover, in this regime, these algorithms are generally robust to model mis-specification.

Submitted: May 19, 2021