Preference Pair
Preference pairs, consisting of two options where one is preferred over the other, are fundamental data for aligning artificial intelligence models with human values. Current research focuses on developing efficient and robust methods for learning from these pairs, moving beyond simple reward modeling towards richer representations that capture complex preference structures, often employing algorithms like Direct Preference Optimization (DPO) and its variants. This work is crucial for improving the safety and reliability of AI systems by enabling better alignment with human intentions across diverse applications, from language models to image generation and beyond.
Papers
Is Free Self-Alignment Possible?
Dyah Adila, Changho Shin, Yijing Zhang, Frederic Sala
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang