Bandit Identification

Bandit identification focuses on efficiently selecting the best option (arm) from a set of possibilities, minimizing the cumulative cost of suboptimal choices (regret). Current research emphasizes extensions to complex scenarios, including federated learning (combining personalized and global objectives), user non-compliance (modeling user abandonment of recommendations), and noisy or incomplete feedback (e.g., in restless bandits or with graph-structured feedback). These advancements are improving the applicability of bandit algorithms to diverse real-world problems, such as personalized recommendations, resource allocation, and industrial control systems, by addressing limitations of traditional models and developing more robust and efficient algorithms.

Papers