Episodic Markov Decision Process
Episodic Markov Decision Processes (EMDPs) model sequential decision-making problems where interactions conclude after a fixed number of steps, focusing on learning optimal policies to maximize cumulative rewards. Current research emphasizes developing provably efficient algorithms, particularly for model-free approaches and settings with function approximation, often employing techniques like upper confidence bounds, posterior sampling, and reference-advantage decomposition to handle stochasticity and improve sample efficiency. These advancements are significant for both theoretical understanding of reinforcement learning and practical applications, enabling faster and more robust learning in complex environments with limited data.
Papers
May 10, 2023
April 27, 2023
March 31, 2023
February 8, 2023
December 27, 2022
October 24, 2022
October 3, 2022
September 21, 2022
September 7, 2022
August 17, 2022
August 11, 2022
July 14, 2022
June 24, 2022
June 23, 2022
June 19, 2022
June 2, 2022
May 23, 2022
March 17, 2022
February 22, 2022
February 10, 2022