Human Preference
Human preference modeling aims to align artificial intelligence systems, particularly large language models, with human values and expectations. Current research focuses on developing efficient algorithms, such as Direct Preference Optimization (DPO) and its variants, and leveraging diverse data sources including pairwise comparisons, ordinal rankings, and even gaze data to better represent the nuances of human preferences. This field is crucial for ensuring the safety, trustworthiness, and ethical deployment of AI systems across various applications, from language generation to robotics. Ongoing work emphasizes improving the robustness and fairness of preference models, addressing biases, and developing more efficient methods for incorporating human feedback.
Papers
A General Theoretical Paradigm to Understand Learning from Human Preferences
Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng, Wei Shen, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang