Episodic Markov Decision Process
Episodic Markov Decision Processes (EMDPs) model sequential decision-making problems where interactions conclude after a fixed number of steps, focusing on learning optimal policies to maximize cumulative rewards. Current research emphasizes developing provably efficient algorithms, particularly for model-free approaches and settings with function approximation, often employing techniques like upper confidence bounds, posterior sampling, and reference-advantage decomposition to handle stochasticity and improve sample efficiency. These advancements are significant for both theoretical understanding of reinforcement learning and practical applications, enabling faster and more robust learning in complex environments with limited data.
Papers
October 3, 2024
July 31, 2024
June 19, 2024
June 11, 2024
May 31, 2024
May 30, 2024
May 29, 2024
April 23, 2024
April 11, 2024
April 4, 2024
December 22, 2023
October 27, 2023
October 23, 2023
October 16, 2023
October 11, 2023
August 28, 2023
August 17, 2023
June 23, 2023
June 4, 2023