Average Reward Reinforcement Learning

Average reward reinforcement learning (RL) focuses on training agents to maximize the average reward per time step in ongoing tasks, unlike discounted reward methods which prioritize immediate rewards. Current research emphasizes developing and analyzing algorithms like Q-learning and actor-critic methods, often incorporating techniques such as relative value iteration and multi-level Monte Carlo to improve convergence guarantees and efficiency, particularly for large or complex state spaces. This area is significant because the average reward criterion is more suitable for continuous tasks than discounted reward, leading to improved performance in real-world applications and a deeper theoretical understanding of RL algorithms.

Papers