Human Alignment
Human alignment in artificial intelligence focuses on aligning the behavior and outputs of large language models (LLMs) and other AI systems with human values and preferences. Current research emphasizes methods like reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and contrastive learning, often incorporating diverse data sources such as eye-tracking and preference rankings to improve model training and evaluation. This work is crucial for ensuring the safety, reliability, and beneficial use of increasingly powerful AI systems, impacting both the development of more trustworthy AI and the broader understanding of human-computer interaction.
Papers
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot
An Analysis of Human Alignment of Latent Diffusion Models
Lorenz Linhardt, Marco Morik, Sidney Bender, Naima Elosegui Borras