Stochastic Bandit

Stochastic bandits are a class of sequential decision-making problems where an agent learns to select actions (arms) that maximize cumulative rewards from an uncertain environment, balancing exploration and exploitation. Current research emphasizes developing algorithms with improved regret bounds (measuring suboptimality) for various settings, including linear bandits, bandits with neural network architectures (like ReLU networks), and those incorporating fairness constraints or handling noisy contexts and heavy-tailed reward distributions. These advancements are driven by the need for efficient and robust solutions in applications ranging from online advertising and personalized recommendations to clinical decision support and resource allocation in large-scale systems. A significant focus is on achieving near-optimal regret bounds while addressing practical considerations like computational efficiency and robustness to adversarial attacks or model misspecification.

Stochastic Bandit

Papers

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

The Impact of Batch Learning in Stochastic Bandits