Average Reward
Average reward reinforcement learning focuses on maximizing the long-run average reward per time step in sequential decision-making problems, offering an alternative to the discounted reward framework commonly used in reinforcement learning. Current research emphasizes developing efficient algorithms, such as Q-learning variants (including RVI and optimistic Q-learning), actor-critic methods (like RVI-SAC and ARO-DDPG), and model-based approaches, often addressing challenges posed by partially observable environments and large state spaces. These advancements are crucial for tackling real-world problems where the long-term average performance is paramount, improving the applicability of reinforcement learning to continuous control tasks and other domains with ongoing interactions.