Regret Bound
Regret bound analysis focuses on quantifying the performance of online learning algorithms, particularly in scenarios like multi-armed bandits and reinforcement learning, by measuring the difference between an algorithm's cumulative reward and that of an optimal strategy. Current research emphasizes developing algorithms with tighter regret bounds, often employing techniques like optimism in the face of uncertainty, Thompson sampling, and advanced exploration strategies tailored to specific problem structures (e.g., linear models, contextual bandits). These improvements have significant implications for various applications, including personalized recommendations, online advertising, and resource allocation, by enabling more efficient and effective decision-making under uncertainty.
Papers
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Xutong Liu, Siwei Wang, Jinhang Zuo, Han Zhong, Xuchuang Wang, Zhiyong Wang, Shuai Li, Mohammad Hajiesmaili, John C.S. Lui, Wei Chen
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone, Zihan Zhang
Sparsity-Agnostic Linear Bandits with Adaptive Adversaries
Tianyuan Jin, Kyoungseok Jang, Nicolò Cesa-Bianchi