Policy Mirror
Policy mirror descent is a family of reinforcement learning algorithms that efficiently updates policies by leveraging mirror descent optimization techniques, aiming to find optimal strategies in various settings, including single-agent and multi-agent scenarios, and continuous or discrete action spaces. Current research focuses on improving convergence rates through techniques like entropy annealing and addressing challenges posed by heterogeneous agents and continuous action spaces, often employing algorithms like HAMDPO and variations of mirror ascent. These advancements offer improved sample complexity and computational efficiency, impacting both theoretical understanding of reinforcement learning and the practical application of these methods to complex problems in robotics, game playing, and other domains.