Alignment Problem
The alignment problem in artificial intelligence focuses on ensuring that advanced models, particularly large language models (LLMs) and diffusion models, behave in ways consistent with human values and intentions. Current research emphasizes improving reward models, developing more robust evaluation metrics (moving beyond deterministic point estimates to probabilistic frameworks), and exploring various alignment techniques, including preference optimization, knowledge distillation, and contrastive learning, often applied within fine-tuning or training-free frameworks. Successfully addressing the alignment problem is crucial for the safe and ethical deployment of powerful AI systems across diverse applications, ranging from healthcare and drug discovery to robotics and social media moderation.
Papers
SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment
Yuchun Fan, Yongyu Mu, Yilin Wang, Lei Huang, Junhao Ruan, Bei Li, Tong Xiao, Shujian Huang, Xiaocheng Feng, Jingbo Zhu
Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment
Prashant Trivedi, Souradip Chakraborty, Avinash Reddy, Vaneet Aggarwal, Amrit Singh Bedi, George K. Atia
Alignment faking in large language models
Ryan Greenblatt, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, Tim Belonax, Jack Chen, David Duvenaud, Akbir Khan, Julian Michael, Sören Mindermann, Ethan Perez, Linda Petrini, Jonathan Uesato, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Evan Hubinger
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization
Guanghan Li, Xun Zhang, Yufei Zhang, Yifan Yin, Guojun Yin, Wei Lin
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, Danilo Comminiello
Aligning Visual and Semantic Interpretability through Visually Grounded Concept Bottleneck Models
Patrick Knab, Katharina Prasse, Sascha Marton, Christian Bartelt, Margret Keuper
Universal Domain Adaptive Object Detection via Dual Probabilistic Alignment
Yuanfan Zheng, Jinlin Wu, Wuyang Li, Zhen Chen