Preference Optimization
Preference optimization (PO) aims to align large language models (LLMs) and other AI systems with human preferences, improving their behavior and outputs. Current research focuses on refining existing algorithms like Direct Preference Optimization (DPO) and its variants, exploring techniques such as sparse token weighting, bidirectional feedback, and incorporating uncertainty estimates to improve efficiency and robustness. This field is crucial for building safer and more beneficial AI systems, impacting both the development of more reliable models and the ethical considerations surrounding their deployment.
Papers
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin, Ilgee Hong, Haoming Jiang, Zichong Li, Qingru Zhang, Zixuan Zhang, Tuo Zhao
Inference Time Alignment with Reward-Guided Tree Search
Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
Direct Multi-Turn Preference Optimization for Language Agents
Wentao Shi, Mengqi Yuan, Junkang Wu, Qifan Wang, Fuli Feng
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, Sathish Reddy Indurthi, Sanqiang Zhao, Kaiqiang Song, Silei Xu, Chenguang Zhu
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang, Tianbao Yang, Tuo Zhao
Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu
Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives
Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen