Heavy Tailed Reward

Heavy-tailed reward distributions, characterized by infrequent but extremely large reward values, pose significant challenges in reinforcement learning (RL) because standard algorithms often fail to converge or perform poorly due to outliers. Current research focuses on developing robust algorithms for various RL settings, including bandits and Markov decision processes, that handle heavy-tailed rewards effectively. These algorithms often employ techniques like truncation, median-of-means estimation, and adaptive Huber regression to mitigate the impact of outliers and achieve near-optimal performance. This research is crucial for advancing the applicability of RL to real-world scenarios where heavy-tailed rewards are common, such as finance and online advertising.

Papers