Preference Feedback
Preference feedback, the use of human-provided comparisons to guide machine learning model training and evaluation, aims to align AI systems with human values and preferences. Current research focuses on improving the efficiency and effectiveness of preference learning, exploring various model architectures like Bradley-Terry and regression models, Direct Preference Optimization (DPO), and generative judges, often incorporating response times and contextual information to enhance the richness of feedback. This field is crucial for mitigating biases and ensuring AI systems are safe, reliable, and beneficial, impacting diverse applications from language model alignment to personalized recommendations and robot navigation.
Papers
January 8, 2025
December 31, 2024
December 28, 2024
December 22, 2024
December 20, 2024
December 19, 2024
December 13, 2024
December 11, 2024
December 2, 2024
November 26, 2024
November 19, 2024
November 16, 2024
November 6, 2024
November 1, 2024
October 31, 2024
October 30, 2024
October 29, 2024
October 24, 2024