Feedback Graph

Feedback graphs model online learning scenarios where observing the outcome of one action reveals information about related actions, as defined by the graph's structure. Current research focuses on developing efficient algorithms, often based on variations of UCB or online mirror descent, that achieve optimal or near-optimal regret bounds under various graph structures (e.g., strongly/weakly observable, stochastic/adversarial environments) and contextual information. This framework improves upon traditional bandit problems by leveraging the inherent dependencies between actions, leading to more sample-efficient learning and impacting diverse applications like inventory control, recommendation systems, and clinical trials.

Papers