Markov Decision Process
Markov Decision Processes (MDPs) are mathematical frameworks for modeling sequential decision-making problems under uncertainty, aiming to find optimal policies that maximize cumulative rewards. Current research emphasizes efficient algorithms for solving MDPs, particularly in complex settings like partially observable MDPs (POMDPs) and constrained MDPs (CMDPs), often employing techniques like policy gradient methods, Q-learning, and active inference. These advancements are crucial for improving the design and analysis of autonomous systems, robotics, and other applications requiring intelligent decision-making in dynamic environments, with a growing focus on addressing issues of safety, robustness, and sample efficiency.
Papers
Learning Memory Mechanisms for Decision Making through Demonstrations
William Yue, Bo Liu, Peter Stone
Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs
Chao Han, Debabrota Basu, Michael Mangan, Eleni Vasilaki, Aditya Gilra
Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
Ruiquan Huang, Yingbin Liang, Jing Yang