Preference Feedback
Preference feedback, the use of human-provided comparisons to guide machine learning model training and evaluation, aims to align AI systems with human values and preferences. Current research focuses on improving the efficiency and effectiveness of preference learning, exploring various model architectures like Bradley-Terry and regression models, Direct Preference Optimization (DPO), and generative judges, often incorporating response times and contextual information to enhance the richness of feedback. This field is crucial for mitigating biases and ensuring AI systems are safe, reliable, and beneficial, impacting diverse applications from language model alignment to personalized recommendations and robot navigation.
Papers
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
Rong Bao, Rui Zheng, Shihan Dou, Xiao Wang, Enyu Zhou, Bo Wang, Qi Zhang, Liang Ding, Dacheng Tao
Online Bandit Learning with Offline Preference Data
Akhil Agnihotri, Rahul Jain, Deepak Ramachandran, Zheng Wen
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi