Multi Armed Bandit
Multi-armed bandits (MABs) are a framework for sequential decision-making under uncertainty, aiming to maximize cumulative reward by strategically selecting actions (arms) with unknown payoff distributions. Current research emphasizes extending MABs to handle non-stationary environments, incorporating human trust and biases, and addressing computational challenges through algorithms like Thompson Sampling and Upper Confidence Bound variations, as well as novel architectures like Bandit Networks. These advancements are driving improvements in diverse applications, including personalized recommendations, resource allocation, and financial portfolio optimization, by enabling more efficient and adaptive decision-making in complex, real-world scenarios.
Papers
Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans
Dibyendu Das, Aditya Patankar, Nilanjan Chakraborty, C.R. Ramakrishnan, I.V. Ramakrishnan
Optimal Streaming Algorithms for Multi-Armed Bandits
Tianyuan Jin, Keke Huang, Jing Tang, Xiaokui Xiao
Bridging Swarm Intelligence and Reinforcement Learning
Karthik Soma, Yann Bouteiller, Heiko Hamann, Giovanni Beltrame