Variance Dependent Regret

Variance-dependent regret in online learning aims to improve algorithms' performance by explicitly incorporating the variance of rewards or losses into regret bounds. Current research focuses on developing algorithms for various settings, including multi-armed bandits, linear bandits, and reinforcement learning, often employing techniques like ensemble methods, adaptive Huber regression, and follow-the-regularized-leader. This focus on variance-awareness leads to tighter regret bounds and improved performance, particularly in scenarios with low-variance rewards, bridging the gap between worst-case and deterministic settings. The resulting algorithms offer enhanced efficiency and adaptability in diverse applications such as recommendation systems and online decision-making.

Papers