Latent Markov Decision Process
Latent Markov Decision Processes (LMDPs) model sequential decision-making problems where crucial information is hidden from the agent, posing significant challenges for reinforcement learning. Current research focuses on developing sample-efficient algorithms that overcome the computational complexity associated with the large latent state space, often leveraging techniques like off-policy evaluation and incorporating prospective side information. These advancements aim to improve the efficiency and theoretical guarantees of reinforcement learning in partially observable environments, with implications for various applications such as dialogue systems and online combinatorial optimization. A key trend is the development of problem-dependent regret bounds and horizon-free algorithms, moving beyond worst-case analyses.