Advantage Actor Critic

Advantage Actor-Critic (A2C) is a reinforcement learning algorithm aiming to improve the efficiency and stability of policy optimization by combining an actor network (for selecting actions) and a critic network (for evaluating actions). Current research focuses on enhancing A2C's performance through techniques like experience replay, variance reduction, and hybrid quantum-classical implementations, as well as extending its application to multi-agent systems and interpretability. These advancements are significant for improving the sample efficiency and applicability of reinforcement learning in diverse fields, including robotics, game playing, and recommender systems.

Papers