Gaussian Process Bandit

Gaussian Process Bandits (GP bandits) address the problem of sequentially selecting actions to maximize cumulative reward from an unknown function, modeled as a Gaussian process. Current research focuses on extending GP bandit algorithms to handle complexities like combinatorial actions, delayed or aggregated feedback, and adversarial corruptions, often employing variations of Upper Confidence Bound (UCB) and Thompson Sampling. These advancements improve the robustness and applicability of GP bandits in diverse fields, including online optimization, resource allocation, and hyperparameter tuning, where efficient exploration and exploitation are crucial. The development of tighter regret bounds and algorithms that adapt to model misspecification are also active areas of investigation.

Papers