Preference Feedback
Preference feedback, the use of human-provided comparisons to guide machine learning model training and evaluation, aims to align AI systems with human values and preferences. Current research focuses on improving the efficiency and effectiveness of preference learning, exploring various model architectures like Bradley-Terry and regression models, Direct Preference Optimization (DPO), and generative judges, often incorporating response times and contextual information to enhance the richness of feedback. This field is crucial for mitigating biases and ensuring AI systems are safe, reliable, and beneficial, impacting diverse applications from language model alignment to personalized recommendations and robot navigation.
Papers
September 30, 2023
September 19, 2023
September 8, 2023
August 30, 2023
August 23, 2023
August 18, 2023
August 1, 2023
July 22, 2023
July 19, 2023
July 13, 2023
July 7, 2023
June 26, 2023
June 9, 2023
May 29, 2023
May 26, 2023
May 18, 2023
May 4, 2023
April 12, 2023
April 10, 2023