Policy Gradient
Policy gradient methods are a core component of reinforcement learning, aiming to optimize policies by directly estimating the gradient of expected cumulative rewards. Current research emphasizes improving sample efficiency and addressing challenges like high-dimensional state spaces and non-convex optimization landscapes through techniques such as residual policy learning, differentiable simulation, and novel policy architectures (e.g., tree-based, low-rank matrix models). These advancements are significant for both theoretical understanding of reinforcement learning algorithms and practical applications in robotics, control systems, and other domains requiring efficient and robust decision-making under uncertainty.
Papers
Do Transformer World Models Give Better Policy Gradients?
Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes
Isaac Grosof, Siva Theja Maguluri, R. Srikant
FlowPG: Action-constrained Policy Gradient with Normalizing Flows
Janaka Chathuranga Brahmanage, Jiajing Ling, Akshat Kumar
Behind the Myth of Exploration in Policy Gradients
Adrien Bolland, Gaspard Lambrechts, Damien Ernst
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property
Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm
Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration
Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou