Contextual Bandit Setting

Contextual bandit settings address the challenge of sequentially selecting actions in environments where rewards depend on both the chosen action and contextual information. Current research focuses on robustly learning optimal policies from potentially biased or incomplete offline data, employing techniques like offline policy learning, causal inference, and decision transformers, often with a focus on mitigating the effects of confounding variables and large action spaces. These advancements are significant for applications such as personalized medicine (e.g., optimizing Warfarin dosage) and online recommendation systems, where efficient and reliable decision-making under uncertainty is crucial. Furthermore, research explores how to effectively leverage collaborative learning and handle adversarial agents in these settings.

Papers