Average Reward Markov Decision Process
Average Reward Markov Decision Processes (AMDPs) focus on finding optimal policies that maximize the long-run average reward in sequential decision-making problems. Current research emphasizes developing efficient algorithms, both model-based and model-free, with improved regret bounds and sample complexities, often leveraging techniques like policy gradient methods, value iteration, and function approximation within various model architectures (e.g., linear, kernel). These advancements are significant for improving the theoretical understanding and practical applicability of reinforcement learning in diverse fields, including robotics, control systems, and resource management, where long-term average performance is crucial.
Papers
October 14, 2024
October 10, 2024
July 9, 2024
June 17, 2024
June 3, 2024
May 27, 2024
April 19, 2024
March 18, 2024
March 11, 2024
February 3, 2024
November 22, 2023
October 18, 2023
October 13, 2023
September 5, 2023
June 28, 2023
May 24, 2023
January 2, 2023
December 1, 2022
June 7, 2022
May 23, 2022