Policy Actor Critic

Policy Actor-Critic (PAC) methods are a class of reinforcement learning algorithms aiming to efficiently learn optimal policies by simultaneously updating a policy (actor) and a value function (critic). Current research focuses on improving sample efficiency and robustness of off-policy PAC algorithms, exploring techniques like multi-step learning, pessimism/optimism control, and unique experience replay to optimize data usage and mitigate overestimation bias. These advancements are significant for addressing challenges in continuous control tasks and enabling applications in robotics, autonomous driving, and other domains requiring efficient learning from complex, high-dimensional environments.

Papers

December 14, 2021

Stochastic Actor-Executor-Critic for Image-to-Image Translation
Ziwei Luo, Jing Hu, Xin Wang, Siwei Lyu, Bin Kong, Youbing Yin, Qi Song, Xi Wu
High Dimensional Image to Image Translation Stochastic Way Model Free Deep Reinforcement Learning Latent Action Policy Actor Critic

November 16, 2021

Off-Policy Actor-Critic with Emphatic Weightings
Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White
Policy Gradient Policy Actor Critic Linear Weighting Policy Policy Gradient

November 4, 2021

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang, Remi Tachet, Romain Laroche
Policy Gradient Stochastic Approximation Global Optimality Distribution Matching Finite Sample Policy Actor Critic

Policy Actor Critic

Papers

Stochastic Actor-Executor-Critic for Image-to-Image Translation

Off-Policy Actor-Critic with Emphatic Weightings

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch