Preference Feedback
Preference feedback, the use of human-provided comparisons to guide machine learning model training and evaluation, aims to align AI systems with human values and preferences. Current research focuses on improving the efficiency and effectiveness of preference learning, exploring various model architectures like Bradley-Terry and regression models, Direct Preference Optimization (DPO), and generative judges, often incorporating response times and contextual information to enhance the richness of feedback. This field is crucial for mitigating biases and ensuring AI systems are safe, reliable, and beneficial, impacting diverse applications from language model alignment to personalized recommendations and robot navigation.
Papers
October 24, 2024
October 23, 2024
October 20, 2024
October 14, 2024
October 11, 2024
October 8, 2024
October 7, 2024
October 5, 2024
October 4, 2024
October 2, 2024
October 1, 2024
September 18, 2024
September 16, 2024
September 15, 2024
September 9, 2024
September 7, 2024
September 4, 2024
August 19, 2024