Preference Optimization
Preference optimization (PO) aims to align large language models (LLMs) and other AI systems with human preferences, improving their behavior and outputs. Current research focuses on refining existing algorithms like Direct Preference Optimization (DPO) and its variants, exploring techniques such as sparse token weighting, bidirectional feedback, and incorporating uncertainty estimates to improve efficiency and robustness. This field is crucial for building safer and more beneficial AI systems, impacting both the development of more reliable models and the ethical considerations surrounding their deployment.
Papers
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna Khodabandeh, Ali Rasekh, Mahyar JafariNodeh, Sepehr kazemi, Simon Gottschalk
Direct Preference Optimization Using Sparse Feature-Level Constraints
Qingyu Yin, Chak Tou Leong, Hongbo Zhang, Minjun Zhu, Hanqi Yan, Qiang Zhang, Yulan He, Wenjie Li, Jun Wang, Yue Zhang, Linyi Yang
Aligning Large Language Models via Self-Steering Optimization
Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun, Jingren Zhou, Junyang Lin
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Models Alignment
Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang