Restless Bandit

Restless bandits model sequential decision-making problems where the rewards of available options (arms) change over time, independently of the choices made. Current research focuses on developing efficient algorithms, such as Thompson sampling, Q-learning variants, and Whittle index policies, to address the computational challenges posed by these dynamic environments, often incorporating contextual information and handling unknown transition dynamics. This framework finds applications in diverse fields like resource allocation, healthcare, and robotics, offering a powerful tool for optimizing resource-constrained systems with evolving conditions. The development of scalable and near-optimal policies for increasingly complex restless bandit settings remains a key area of ongoing investigation.

Papers