Preference Alignment
Preference alignment in large language models (LLMs) focuses on aligning model outputs with human preferences, improving helpfulness, harmlessness, and overall quality. Current research emphasizes techniques like Direct Preference Optimization (DPO) and its variants, often incorporating token-level weighting or importance sampling to enhance efficiency and address issues like update regression. This field is crucial for responsible LLM deployment, impacting various applications from translation and text-to-speech to healthcare and robotics by ensuring models generate outputs that align with human values and expectations.
Papers
Inference time LLM alignment in single and multidomain preference spectrum
Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Monica Sunkara, Kishaloy Halder, Manuel Mager, Yassine Benajiba
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
Weijian Luo
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
Oh Joon Kwon, Daiki E. Matsunaga, Kee-Eung Kim
Nova: A Practical and Advanced Alignment
Mingan Lin, Fan Yang, Yanjun Shen, Haoze Sun, Tianpeng Li, Tao Zhang, Chenzheng Zhu, Tao Zhang, Miao Zheng, Xu Li, Yijie Zhou, Mingyang Chen, Yanzhao Qin, Youquan Li, Hao Liang, Fei Li, Yadong Li, Mang Wang, Guosheng Dong, Kun Fang, Jianhua Xu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen
Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback
Dennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis Langlotz, Akshay S Chaudhari
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh