Restless Multi Armed Bandit

Restless multi-armed bandits (RMABs) model sequential decision-making problems where choosing an action not only yields an immediate reward but also affects the future state of that action (arm), unlike standard multi-armed bandits. Current research focuses on developing efficient algorithms, such as Q-learning variants (including deep Q-networks), Whittle index-based approaches, and novel methods like GINO-Q, to overcome the computational challenges posed by the large state spaces inherent in RMABs. These advancements are driving progress in diverse fields, including personalized education, resource allocation in public health, and optimizing communication systems, by enabling more effective and efficient resource management in dynamic environments. Furthermore, recent work addresses challenges like non-Markovian behavior, fairness considerations, and the incorporation of global rewards and network effects.

Papers