Linear Contextual Bandit

Linear contextual bandits address the challenge of sequentially selecting actions (arms) based on observed contextual information to maximize cumulative reward, balancing exploration and exploitation. Current research focuses on improving algorithm efficiency and robustness, exploring variations like multi-task learning, interference-aware models, and handling adversarial corruptions or misspecifications, often employing algorithms such as LinUCB, Thompson Sampling, and Follow-The-Regularized-Leader (FTRL) with various modifications. These advancements are significant for applications like personalized recommendations, online advertising, and resource allocation, offering improved theoretical guarantees and practical performance in dynamic and uncertain environments. The field is also actively investigating optimal batching strategies and the interplay between representation learning and regret minimization.

Papers