Robust Preference
Robust preference optimization aims to train AI models, particularly large language models (LLMs), to reliably and consistently reflect human preferences, even when faced with noisy or incomplete data. Current research focuses on developing algorithms and model architectures that are resilient to inconsistencies in human feedback, such as Direct Preference Optimization (DPO) and its variants, often incorporating techniques like reward model distillation or adversarial training to improve robustness. This work is crucial for building more reliable and ethically aligned AI systems, addressing limitations in current preference learning methods and improving the safety and effectiveness of LLMs in real-world applications.
Papers
July 2, 2024
June 3, 2024
May 30, 2024
May 29, 2024
April 5, 2024
March 5, 2024
March 1, 2024
November 28, 2023
October 14, 2022