Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
Nicholas E. Corrado, Josiah P. Hanna
Deep Reinforcement Learning for 2D Physics-Based Object Manipulation in Clutter
Luca Renna