Paper ID: 2203.10214

Thompson Sampling on Asymmetric $\alpha$-Stable Bandits

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric $\alpha$-stable distributions and explore their applications in modelling financial and wireless data.

Submitted: Mar 19, 2022