Contextual Linear Bandit

Contextual linear bandits are online learning algorithms that aim to maximize cumulative rewards by selecting actions based on observed contexts and received feedback. Current research focuses on improving algorithm efficiency and robustness in various settings, including hybrid reward models, strategic agents, federated learning environments, and the presence of noise or adversarial corruptions, often employing algorithms like LinUCB and its variants, Thompson Sampling, and Lasso-based methods. These advancements are significant for applications like recommender systems, personalized medicine, and resource allocation, where efficient and reliable learning from contextual data is crucial. The field is actively exploring optimal regret bounds and addressing challenges like privacy preservation and high-dimensional data.

Papers