Minimax Regret

Minimax regret, a key concept in online learning, aims to minimize the worst-case difference between an algorithm's performance and that of an optimal strategy chosen in hindsight. Current research focuses on refining regret bounds for various models, including multi-armed bandits, contextual bandits, and reinforcement learning, often employing algorithms like Follow-the-Regularized-Leader (FTRL) and Thompson Sampling, and exploring the impact of factors like feedback mechanisms, trust models, and causal structures. Understanding and minimizing minimax regret is crucial for designing robust and efficient algorithms in diverse applications ranging from online advertising and recommendation systems to robotics and control. The field is actively developing tighter theoretical bounds and more efficient algorithms, bridging the gap between theoretical guarantees and practical performance.

Papers