Preference Alignment
Preference alignment in large language models (LLMs) focuses on aligning model outputs with human preferences, improving helpfulness, harmlessness, and overall quality. Current research emphasizes techniques like Direct Preference Optimization (DPO) and its variants, often incorporating token-level weighting or importance sampling to enhance efficiency and address issues like update regression. This field is crucial for responsible LLM deployment, impacting various applications from translation and text-to-speech to healthcare and robotics by ensuring models generate outputs that align with human values and expectations.
Papers
Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback
Dennis Hein, Zhihong Chen, Sophie Ostmeier, Justin Xu, Maya Varma, Eduardo Pontes Reis, Arne Edward Michalson, Christian Bluethgen, Hyun Joo Shin, Curtis Langlotz, Akshay S Chaudhari
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning
Shiva Kumar Pentyala, Zhichao Wang, Bin Bi, Kiran Ramnath, Xiang-Bo Mao, Regunathan Radhakrishnan, Sitaram Asur, Na, Cheng
Aligning Diffusion Models with Noise-Conditioned Perception
Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov