Human Preference
Human preference modeling aims to align artificial intelligence systems, particularly large language models, with human values and expectations. Current research focuses on developing efficient algorithms, such as Direct Preference Optimization (DPO) and its variants, and leveraging diverse data sources including pairwise comparisons, ordinal rankings, and even gaze data to better represent the nuances of human preferences. This field is crucial for ensuring the safety, trustworthiness, and ethical deployment of AI systems across various applications, from language generation to robotics. Ongoing work emphasizes improving the robustness and fairness of preference models, addressing biases, and developing more efficient methods for incorporating human feedback.
Papers
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothesis, Zejia Zhang, Blaise Agüera y Arcas, Jonathan Birch
Active Preference-based Learning for Multi-dimensional Personalization
Minhyeon Oh, Seungjoon Lee, Jungseul Ok
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Indra Winata, Anirban Das, Shi-Xiong Zhang, David D. Yao, Wenpin Tang, Sambit Sahu
Non-monotonic Extensions to Formal Concept Analysis via Object Preferences
Lucas Carr, Nicholas Leisegang, Thomas Meyer, Sebastian Rudolph
Towards Effective Counter-Responses: Aligning Human Preferences with Strategies to Combat Online Trolling
Huije Lee, Hoyun Song, Jisu Shin, Sukmin Cho, SeungYoon Han, Jong C. Park