Actor Critic
Actor-critic methods are a class of reinforcement learning algorithms that use two neural networks—an actor and a critic—to learn optimal policies for controlling systems. Current research focuses on improving the efficiency and stability of these methods, addressing challenges like gradient approximation inaccuracies, exploration in high-dimensional spaces, and handling constraints or partial observability, often employing architectures like deep deterministic policy gradients (DDPG) and proximal policy optimization (PPO). These advancements are significant for various applications, including robotics, control systems, and even areas like speech generation and hardware/software co-optimization, where actor-critic methods enable efficient learning of complex control policies from limited data.
Papers
Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning
Dingyang Chen, Qi Zhang
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Chris Lu, Gunshi Gupta, Nicolas Beltran-Velez, Angelos Filos, Jakob Nicolaus Foerster, Yarin Gal