Semi Markov Decision Process

Semi-Markov Decision Processes (SMDPs) extend Markov Decision Processes by allowing for variable-length time steps between decisions, reflecting the realities of many real-world systems where events don't occur at fixed intervals. Current research focuses on developing efficient algorithms, such as asynchronous stochastic approximation and deep reinforcement learning methods (including Q-learning variants), to solve SMDPs, particularly in complex scenarios with continuous action spaces or hierarchical structures (like option-based approaches). These advancements are improving the modeling and optimization of diverse applications, including dynamic vehicle routing, resource allocation (e.g., caching and medical evacuation), and other systems with stochastic and time-varying dynamics. The resulting improved decision-making capabilities have significant implications across various fields.

Papers