Contextual Bandit
Contextual bandits are a machine learning framework for sequential decision-making where actions are chosen based on observed contextual information to maximize cumulative reward. Current research emphasizes improving algorithm efficiency and robustness, focusing on techniques like Thompson Sampling, Upper Confidence Bound (UCB) variations, and incorporating advanced model architectures such as neural networks and diffusion models to handle complex reward functions and high-dimensional contexts. This field is significant due to its broad applicability in personalized recommendations, online advertising, and resource allocation, with recent work addressing challenges like delayed feedback, interference between actions, and the need for interpretability and fairness.
Papers
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings
Haochen Song, Ilya Musabirov, Ananya Bhattacharjee, Audrey Durand, Meredith Franklin, Anna Rafferty, Joseph Jay Williams
Truthful mechanisms for linear bandit games with private contexts
Yiting Hu, Lingjie Duan