Primal Dual Reinforcement Learning

Primal-dual reinforcement learning (RL) tackles constrained Markov decision processes (CMDPs) by framing the optimization problem as a saddle-point problem, balancing reward maximization with constraint satisfaction. Recent research focuses on developing provably efficient algorithms, such as primal-dual actor-critic methods and variations employing policy gradient approaches, that achieve optimal or near-optimal policies under various assumptions (e.g., linear function approximation, offline settings). This approach offers improved theoretical guarantees, including sample complexity and convergence rates, and addresses challenges in both online and offline RL settings, particularly for constrained problems and non-stationary environments, leading to more robust and reliable RL agents.

Papers