Average Reward Markov Decision Process

Average Reward Markov Decision Processes (AMDPs) focus on finding optimal policies that maximize the long-run average reward in sequential decision-making problems. Current research emphasizes developing efficient algorithms, both model-based and model-free, with improved regret bounds and sample complexities, often leveraging techniques like policy gradient methods, value iteration, and function approximation within various model architectures (e.g., linear, kernel). These advancements are significant for improving the theoretical understanding and practical applicability of reinforcement learning in diverse fields, including robotics, control systems, and resource management, where long-term average performance is crucial.

Papers

May 11, 2022

Stochastic first-order methods for average-reward Markov decision processes
Tianjiao Li, Feiyang Wu, Guanghui Lan
Policy Gradient Policy OpTimization Policy Mirror Descent Average Reward Markov Decision Process Stochastic First Order Method

Average Reward Markov Decision Process

Papers

Stochastic first-order methods for average-reward Markov decision processes