Dependent Regret

Dependent regret in online learning focuses on minimizing the difference between an algorithm's cumulative loss and that of a comparator, where the comparator's performance can depend on the data observed. Current research emphasizes developing algorithms with data-dependent regret bounds, often employing techniques like Thompson sampling, optimistic online learning, and follow-the-regularized-leader methods, adapting to both stochastic and adversarial environments, and incorporating variance information for improved performance. These advancements are significant because they lead to more efficient and robust algorithms for various applications, including multi-armed bandits and contextual bandits, improving decision-making under uncertainty.

Papers