Human Preference

Human preference modeling aims to align artificial intelligence systems, particularly large language models, with human values and expectations. Current research focuses on developing efficient algorithms, such as Direct Preference Optimization (DPO) and its variants, and leveraging diverse data sources including pairwise comparisons, ordinal rankings, and even gaze data to better represent the nuances of human preferences. This field is crucial for ensuring the safety, trustworthiness, and ethical deployment of AI systems across various applications, from language generation to robotics. Ongoing work emphasizes improving the robustness and fairness of preference models, addressing biases, and developing more efficient methods for incorporating human feedback.

Papers