Preference Learning
Preference learning aims to align artificial intelligence models, particularly large language models, with human preferences by learning from human feedback on model outputs. Current research focuses on developing efficient algorithms, such as direct preference optimization and reinforcement learning from human feedback, often incorporating advanced model architectures like diffusion models and variational autoencoders to handle complex preference structures, including intransitivity. This field is crucial for building trustworthy and beneficial AI systems, improving their performance on various tasks and ensuring alignment with human values in diverse applications ranging from robotics to natural language processing.
Papers
Few-shot In-Context Preference Learning Using Large Language Models
Chao Yu, Hong Lu, Jiaxuan Gao, Qixin Tan, Xinting Yang, Yu Wang, Yi Wu, Eugene Vinitsky
Aligning Large Language Models via Self-Steering Optimization
Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun, Jingren Zhou, Junyang Lin