Preference Optimization
Preference optimization (PO) aims to align large language models (LLMs) and other AI systems with human preferences, improving their behavior and outputs. Current research focuses on refining existing algorithms like Direct Preference Optimization (DPO) and its variants, exploring techniques such as sparse token weighting, bidirectional feedback, and incorporating uncertainty estimates to improve efficiency and robustness. This field is crucial for building safer and more beneficial AI systems, impacting both the development of more reliable models and the ethical considerations surrounding their deployment.
Papers
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li, Haojing Huang, Yujia Zhang, Pengfei Xu, Xi Chen, Rui Song, Lida Shi, Jingwen Wang, Hao Xu
Modulated Intervention Preference Optimization (MIPO): Keey the Easy, Refine the Difficult
Cheolhun Jang
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta, Sanjana Nambiar, Raz Besaleli, Samuel Dooley, Max Cembalest, John P. Dickerson
Orthogonal Finetuning for Direct Preference Optimization
Chenxu Yang, Ruipeng Jia, Naibin Gu, Zheng Lin, Siyuan Chen, Chao Pang, Weichong Yin, Yu Sun, Hua Wu, Weiping Wang