Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
Matrix Low-Rank Approximation For Policy Gradient Methods
Sergio Rozada, Antonio G. Marques
Matrix Low-Rank Trust Region Policy Optimization
Sergio Rozada, Antonio G. Marques
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell