Stochastic Policy Gradient
Stochastic Policy Gradient (SPG) methods aim to optimize policies in reinforcement learning by iteratively updating policy parameters based on noisy gradient estimates. Current research focuses on improving the efficiency and convergence guarantees of SPG, exploring techniques like momentum (e.g., heavy-ball momentum), negative momentum, and second-order methods (e.g., Stochastic Cubic Regularized Newton) to accelerate learning and achieve better sample complexity. These advancements are significant because they address the inherent challenges of non-convexity and high variance in policy optimization, leading to more efficient and robust reinforcement learning algorithms applicable to various control and decision-making problems.
Papers
September 25, 2024
August 13, 2024
May 24, 2024
May 8, 2024
May 3, 2024
February 22, 2024
May 5, 2023
February 3, 2023
January 16, 2023
October 24, 2022
June 14, 2022
May 25, 2022