Latent Markov Decision Process

Latent Markov Decision Processes (LMDPs) model sequential decision-making problems where crucial information is hidden from the agent, posing significant challenges for reinforcement learning. Current research focuses on developing sample-efficient algorithms that overcome the computational complexity associated with the large latent state space, often leveraging techniques like off-policy evaluation and incorporating prospective side information. These advancements aim to improve the efficiency and theoretical guarantees of reinforcement learning in partially observable environments, with implications for various applications such as dialogue systems and online combinatorial optimization. A key trend is the development of problem-dependent regret bounds and horizon-free algorithms, moving beyond worst-case analyses.

Papers

June 12, 2024

Near-Optimal Learning and Planning in Separated Latent MDPs
Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin
Task Planning Optimal Policy Efficient Learning Latent MDPs Latent Markov Decision Process

June 3, 2024

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni
Policy Evaluation Human Like RL High Probability Guarantee Latent MDPs Latent Markov Decision Process

October 11, 2023

Prospective Side Information for Latent MDPs
Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
Markov Decision Process O$ Regret Near Optimal Policy Side Information Partially Observed Markov Decision Process Latent MDPs Latent Markov Decision Process

October 20, 2022

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes
Runlong Zhou, Ruosong Wang, Simon S. Du
Regret Minimization New Horizon Free Regret Latent Markov Decision Process

February 11, 2022

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization
Runlong Zhou, Zelin He, Yuandong Tian, Yi Wu, Simon S. Du
Reinforcement Learning Curriculum Learning Policy OpTimization Policy Optimization Natural Policy Gradient Online Combinatorial Latent Markov Decision Process

Latent Markov Decision Process

Papers

Near-Optimal Learning and Planning in Separated Latent MDPs

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Prospective Side Information for Latent MDPs

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization