Stochastic Policy Gradient

Stochastic Policy Gradient (SPG) methods aim to optimize policies in reinforcement learning by iteratively updating policy parameters based on noisy gradient estimates. Current research focuses on improving the efficiency and convergence guarantees of SPG, exploring techniques like momentum (e.g., heavy-ball momentum), negative momentum, and second-order methods (e.g., Stochastic Cubic Regularized Newton) to accelerate learning and achieve better sample complexity. These advancements are significant because they address the inherent challenges of non-convexity and high variance in policy optimization, leading to more efficient and robust reinforcement learning algorithms applicable to various control and decision-making problems.

Papers