Preference Label
Preference labeling in machine learning, particularly for large language models (LLMs), focuses on efficiently and accurately capturing human preferences to guide model training and evaluation. Current research emphasizes moving beyond simple binary preferences towards richer representations like continuous or distributional labels, often leveraging techniques like Direct Preference Optimization (DPO) and reinforcement learning (RL), including RL from human feedback (RLHF) and its more scalable AI-feedback counterpart (RLAIF). This work is crucial for aligning LLMs with human values and improving their performance across various tasks, addressing challenges like bias in evaluation metrics and the high cost of human annotation.
Papers
November 20, 2024
October 17, 2024
October 11, 2024
September 10, 2024
July 23, 2024
July 22, 2024
July 1, 2024
June 25, 2024
June 14, 2024
April 22, 2024
March 18, 2024
March 10, 2024
February 3, 2024
October 16, 2023
September 1, 2023