Policy Mirror Descent

Policy Mirror Descent (PMD) is a family of reinforcement learning algorithms aiming to efficiently find optimal policies by iteratively updating policy distributions based on gradient information, often leveraging Bregman divergences for regularization and projection. Current research focuses on improving convergence rates, particularly achieving linear convergence, through techniques like functional acceleration, adaptive step sizes, and exploring different mirror maps beyond the commonly used negative entropy. This work is significant for its theoretical rigor in establishing convergence guarantees and its practical implications for developing more efficient and robust reinforcement learning algorithms applicable to various problem settings, including those with large state or action spaces and multi-agent scenarios.

Papers