Proxy Reward
Proxy reward methods address the challenge of training reinforcement learning agents in scenarios with delayed or sparse rewards, aiming to create more efficient and effective learning processes. Current research focuses on developing cost-effective proxy reward models, often employing techniques like active learning and on-policy data collection to minimize human feedback requirements, as well as exploring ensemble methods and alternative reward optimization strategies to mitigate over-optimization issues. These advancements are significant for improving the sample efficiency and robustness of reinforcement learning, particularly in complex real-world applications like large language model alignment and recommender systems.
Papers
October 29, 2024
October 28, 2024
October 26, 2024
October 8, 2024
September 6, 2024
July 2, 2024
June 5, 2024
May 22, 2024
February 2, 2024
December 17, 2023
October 13, 2023
October 4, 2023
July 19, 2023
February 27, 2023
October 19, 2022
September 27, 2022
June 13, 2022
December 2, 2021