Partially Observable Markov Decision Process
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making under uncertainty where the current state is not fully known, aiming to find optimal policies maximizing cumulative rewards. Current research emphasizes efficient algorithms for solving POMDPs, particularly those with continuous action spaces and sparse rewards, often employing techniques like Monte Carlo tree search, policy gradient methods, and neural network approximations of value functions and policies. These advancements are driving progress in diverse fields, including robotics (e.g., robust control, task planning), healthcare (e.g., optimized diagnosis and treatment), and resource management (e.g., carbon storage optimization), by enabling more effective decision-making in complex, uncertain environments.
Papers
Learning Reward Machines: A Study in Partially Observable Reinforcement Learning
Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Valenzano, Margarita P. Castro, Sheila A. McIlraith
Compositional Learning-based Planning for Vision POMDPs
Sampada Deglurkar, Michael H. Lim, Johnathan Tucker, Zachary N. Sunberg, Aleksandra Faust, Claire J. Tomlin