Infinite Horizon
Infinite-horizon problems in Markov Decision Processes (MDPs) focus on optimizing long-term rewards or costs in dynamic systems with potentially unbounded time horizons. Current research emphasizes developing efficient algorithms, such as policy gradient methods, Thompson sampling, and value iteration variants, often incorporating techniques like function approximation and regularization to handle large or continuous state and action spaces. These advancements address challenges in areas like reinforcement learning, control theory, and operations research, improving the ability to model and solve complex sequential decision-making problems in various applications. The resulting improvements in algorithm efficiency and theoretical understanding have significant implications for fields ranging from robotics and healthcare to resource management and finance.
Papers
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Bhargav Ganguly, Yang Xu, Vaneet Aggarwal
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Washim Uddin Mondal, Vaneet Aggarwal
Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach
Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen
Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs
Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar