Multi Armed Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative reward by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes extending MABs to handle non-stationary environments, incorporating human trust and biases, and addressing computational challenges through algorithms like Thompson Sampling and Upper Confidence Bound variations, as well as novel architectures like Bandit Networks. These advancements are driving improvements in diverse applications, including personalized recommendations, resource allocation, and financial portfolio optimization, by enabling more efficient and adaptive decision-making in complex, real-world scenarios.
Papers
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards
Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina Verbeeke, Pratik Gajane
Containing a spread through sequential learning: to exploit or to explore?
Xingran Chen, Hesam Nikpey, Jungyeol Kim, Saswati Sarkar, Shirin Saeedi-Bidokhti
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits
Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids -- PowerTAC
Sanjay Chandlekar, Arthik Boroju, Shweta Jain, Sujit Gujar