Proxy Reward

Proxy reward methods address the challenge of training reinforcement learning agents in scenarios with delayed or sparse rewards, aiming to create more efficient and effective learning processes. Current research focuses on developing cost-effective proxy reward models, often employing techniques like active learning and on-policy data collection to minimize human feedback requirements, as well as exploring ensemble methods and alternative reward optimization strategies to mitigate over-optimization issues. These advancements are significant for improving the sample efficiency and robustness of reinforcement learning, particularly in complex real-world applications like large language model alignment and recommender systems.

Papers