Entropy Regularized Markov Decision Process

Entropy-regularized Markov Decision Processes (ER-MDPs) enhance standard reinforcement learning by adding entropy to the reward function, promoting exploration and robustness to uncertainty in model parameters. Current research focuses on analyzing the convergence rates of various algorithms, including policy gradient methods and mirror descent approaches, often employing neural network approximations for continuous state and action spaces. This work is significant because it provides theoretical guarantees for the performance of ER-MDPs, improving the understanding of exploration-exploitation trade-offs and leading to more sample-efficient and robust reinforcement learning algorithms for applications like imitation learning and control.

Papers