Continuous Time Markov Decision Process

Continuous-time Markov decision processes (CTMDPs) model sequential decision-making in systems with continuous time and stochastic transitions, aiming to find optimal policies that maximize expected rewards. Current research emphasizes efficient exploration strategies, particularly model-free approaches, and developing stability guarantees for learning algorithms in continuous state-action spaces, often employing techniques like linear function approximation. These advancements are crucial for tackling real-world problems, such as epidemic control and autonomous navigation, where accurate modeling of continuous dynamics and efficient learning are essential. The development of rigorous theoretical bounds on learning performance, alongside practical algorithms for solving CTMDPs with complex objectives, is a major focus.

Papers