Markov Decision Process
Markov Decision Processes (MDPs) are mathematical frameworks for modeling sequential decision-making problems under uncertainty, aiming to find optimal policies that maximize cumulative rewards. Current research emphasizes efficient algorithms for solving MDPs, particularly in complex settings like partially observable MDPs (POMDPs) and constrained MDPs (CMDPs), often employing techniques like policy gradient methods, Q-learning, and active inference. These advancements are crucial for improving the design and analysis of autonomous systems, robotics, and other applications requiring intelligent decision-making in dynamic environments, with a growing focus on addressing issues of safety, robustness, and sample efficiency.
Papers
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes
Yi Wan, Huizhen Yu, Richard S. Sutton
Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning
Georgios Bakirtzis, Michail Savvas, Ruihan Zhao, Sandeep Chinchali, Ufuk Topcu
Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach
Johan Peralez, Aurèlien Delage, Jacopo Castellini, Rafael F. Cunha, Jilles S. Dibangoye
Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control
Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley
An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing
Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu
Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning
Manav Vora, Michael N Grussing, Melkior Ornik
Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation
Yanjie Dong, Haijun Zhang, Gang Wang, Shisheng Cui, Xiping Hu
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
Mohammad Boveiri, Peyman Mohajerin Esfahani