Proximal Policy Optimization
Proximal Policy Optimization (PPO) is a reinforcement learning algorithm used to train agents to make optimal decisions in complex environments, with a current research focus on improving its efficiency and robustness. Recent work explores enhancements such as refined credit assignment methods (e.g., VinePPO), incorporation of human feedback and safety mechanisms (e.g., HI-PPO, PRPO), and addressing challenges in high-dimensional spaces and sample efficiency through techniques like diffusion model integration. These advancements are significant for various applications, including robotics, autonomous systems, and large language model alignment, where PPO's ability to learn effective policies from interactions with the environment is crucial.
Papers
Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems
Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar
On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Gao Tianci, Dmitriev D. Dmitry, Konstantin A. Neusypin, Yang Bo, Rao Shengren
Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning
Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge