Implicit Reward
Implicit reward learning aims to infer human preferences from preference data, bypassing the need for explicitly defined reward functions in reinforcement learning, particularly for aligning large language models (LLMs) with human intent. Current research focuses on improving the generalization capabilities of implicit reward models, often employing algorithms like Direct Preference Optimization (DPO) and its variants, and exploring techniques to enhance training stability and efficiency. This area is crucial for advancing LLM alignment, enabling more robust and reliable AI systems while also offering insights into human preference modeling and decision-making processes.
Papers
November 6, 2024
October 28, 2024
October 16, 2024
October 12, 2024
October 10, 2024
September 5, 2024
August 27, 2024
August 19, 2024
June 25, 2024
June 18, 2024
June 16, 2024
June 14, 2024
May 30, 2024
May 29, 2024
May 23, 2024
May 21, 2024
April 18, 2024
March 20, 2024