Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search
Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
Learning Control from Raw Position Measurements
Fabio Amadio, Alberto Dalla Libera, Daniel Nikovski, Ruggero Carli, Diego Romeres
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano, Rui Yuan, Patrick Rebeschini
Learning the Kalman Filter with Fine-Grained Sample Complexity
Xiangyuan Zhang, Bin Hu, Tamer Başar