Average Reward Markov Decision Process

Average Reward Markov Decision Processes (AMDPs) focus on finding optimal policies that maximize the long-run average reward in sequential decision-making problems. Current research emphasizes developing efficient algorithms, both model-based and model-free, with improved regret bounds and sample complexities, often leveraging techniques like policy gradient methods, value iteration, and function approximation within various model architectures (e.g., linear, kernel). These advancements are significant for improving the theoretical understanding and practical applicability of reinforcement learning in diverse fields, including robotics, control systems, and resource management, where long-term average performance is crucial.

Papers