Graph Bandit

Graph bandits extend the classic multi-armed bandit problem by incorporating graph structures representing dependencies between actions or arms, reflecting real-world scenarios where choosing one action influences the rewards of others. Current research focuses on developing algorithms, such as those based on Thompson Sampling and Upper Confidence Bounds (UCB), often incorporating graph neural networks (GNNs) for reward prediction and leveraging graph properties to improve efficiency and regret bounds. This area is significant because it allows for more realistic modeling of sequential decision-making problems in domains like recommendation systems, clinical trials, and multi-agent systems, leading to improved algorithms with theoretical guarantees and practical applications.

Papers