Alignment Problem
The alignment problem in artificial intelligence focuses on ensuring that advanced models, particularly large language models (LLMs) and diffusion models, behave in ways consistent with human values and intentions. Current research emphasizes improving reward models, developing more robust evaluation metrics (moving beyond deterministic point estimates to probabilistic frameworks), and exploring various alignment techniques, including preference optimization, knowledge distillation, and contrastive learning, often applied within fine-tuning or training-free frameworks. Successfully addressing the alignment problem is crucial for the safe and ethical deployment of powerful AI systems across diverse applications, ranging from healthcare and drug discovery to robotics and social media moderation.
Papers
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Yingxiu Zhao, Bowen Yu, Binyuan Hui, Haiyang Yu, Fei Huang, Yongbin Li, Nevin L. Zhang
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li
Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers
Félix Gaschi, Patricio Cerda, Parisa Rastin, Yannick Toussaint
Query Encoder Distillation via Embedding Alignment is a Strong Baseline Method to Boost Dense Retriever Online Efficiency
Yuxuan Wang, Hong Lyu