Gaussian Reward

Gaussian reward models are central to multi-armed bandit problems, a framework for sequential decision-making under uncertainty where rewards are assumed to follow a Gaussian distribution. Current research focuses on improving the efficiency and robustness of algorithms for best-arm identification and policy evaluation in these settings, particularly addressing challenges posed by unknown variances, heavy-tailed rewards, and covariate shifts in contextual bandits. These advancements are crucial for optimizing resource allocation in various applications, including clinical trials, online advertising, and reinforcement learning, where reliable and efficient decision-making is paramount.

Papers