Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient
Wubing Chen, Wenbin Li, Xiao Liu, Shangdong Yang, Yang Gao
Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems
Junmin Zhong, Ruofan Wu, Jennie Si
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
Soumyajit Guin, Shalabh Bhatnagar