Preference Datasets
Preference datasets are collections of human judgments comparing different outputs generated by large language models (LLMs), used to align these models with human values and preferences. Current research focuses on improving the efficiency and quality of these datasets, exploring methods like auction mechanisms for cost-effective data collection, metrics for dataset comparison, and techniques to reduce noise and bias. This work is crucial for developing more reliable and ethically aligned LLMs, impacting both the advancement of AI research and the development of safer, more user-friendly AI applications.
Papers
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James V. Miranda, Yizhong Wang, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu, Liang Zeng, Jiacai Liu, Rui Yan, Jujie He, Chaojie Wang, Shuicheng Yan, Yang Liu, Yahui Zhou