Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient
Wubing Chen, Wenbin Li, Xiao Liu, Shangdong Yang, Yang Gao
Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems
Junmin Zhong, Ruofan Wu, Jennie Si
A policy gradient approach for Finite Horizon Constrained Markov Decision Processes
Soumyajit Guin, Shalabh Bhatnagar
Training Efficient Controllers via Analytic Policy Gradient
Nina Wiedemann, Valentin Wüest, Antonio Loquercio, Matthias Müller, Dario Floreano, Davide Scaramuzza
Improving Document Image Understanding with Reinforcement Finetuning
Bao-Sinh Nguyen, Dung Tien Le, Hieu M. Vu, Tuan Anh D. Nguyen, Minh-Tien Nguyen, Hung Le