Dynamic Regret

Dynamic regret, a stronger performance metric than static regret, quantifies an online learning algorithm's cumulative suboptimality compared to the best *sequence* of decisions in hindsight, rather than a single best fixed decision. Current research focuses on developing algorithms with optimal dynamic regret bounds for various online learning settings, including contextual bandits, online linear regression, and Markov Decision Processes (MDPs), often employing techniques like discounted forecasting, gradient tracking, and optimistic online mirror descent. These advancements are significant because they enable more robust and adaptable algorithms for non-stationary environments, improving performance in applications ranging from online control systems to personalized recommendations.

Papers