Constrained Markov Decision Process

Constrained Markov Decision Processes (CMDPs) extend standard reinforcement learning by incorporating constraints on actions or cumulative costs, enabling safe and efficient decision-making in real-world scenarios with limitations. Current research heavily focuses on developing efficient algorithms, particularly primal-dual methods and policy optimization approaches, that achieve optimal regret (difference between achieved and optimal reward) and constraint violation bounds under various feedback (full or bandit) and constraint types (stochastic or adversarial). These advancements are crucial for deploying reinforcement learning in safety-critical applications like autonomous driving and robotics, where adhering to constraints is paramount.

Papers