Reward Report
Reward report research centers on efficiently learning reward functions to guide reinforcement learning (RL) agents, particularly in complex domains like large language models (LLMs) and robotics. Current efforts focus on improving reward model accuracy and efficiency through techniques like active learning, parameter insertion within existing model architectures, and leveraging vision-language models (VLMs) to generate dense reward functions. This research is crucial for advancing RL's capabilities in safety-critical applications and for aligning AI systems more effectively with human preferences, ultimately leading to more robust and beneficial AI systems.
Papers
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Zihan Liu, Yang Chen, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation
Jihao Gu, Yingyao Wang, Meng Cao, Pi Bu, Jun Song, Yancheng He, Shilong Li, Bo Zheng