Preference Feedback
Preference feedback, the use of human-provided comparisons to guide machine learning model training and evaluation, aims to align AI systems with human values and preferences. Current research focuses on improving the efficiency and effectiveness of preference learning, exploring various model architectures like Bradley-Terry and regression models, Direct Preference Optimization (DPO), and generative judges, often incorporating response times and contextual information to enhance the richness of feedback. This field is crucial for mitigating biases and ensuring AI systems are safe, reliable, and beneficial, impacting diverse applications from language model alignment to personalized recommendations and robot navigation.
Papers
September 14, 2022
July 7, 2022
June 15, 2022
May 29, 2022
May 26, 2022
May 18, 2022
April 28, 2022
April 26, 2022
April 12, 2022
March 20, 2022
March 5, 2022
February 21, 2022
February 14, 2022
January 25, 2022
January 11, 2022
December 30, 2021
December 17, 2021
November 17, 2021