Alignment Problem
The alignment problem in artificial intelligence focuses on ensuring that advanced models, particularly large language models (LLMs) and diffusion models, behave in ways consistent with human values and intentions. Current research emphasizes improving reward models, developing more robust evaluation metrics (moving beyond deterministic point estimates to probabilistic frameworks), and exploring various alignment techniques, including preference optimization, knowledge distillation, and contrastive learning, often applied within fine-tuning or training-free frameworks. Successfully addressing the alignment problem is crucial for the safe and ethical deployment of powerful AI systems across diverse applications, ranging from healthcare and drug discovery to robotics and social media moderation.
Papers
LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds
James Beetham, Souradip Chakraborty, Mengdi Wang, Furong Huang, Amrit Singh Bedi, Mubarak Shah
Enhancing Cross-Language Code Translation via Task-Specific Embedding Alignment in Retrieval-Augmented Generation
Manish Bhattarai, Minh Vu, Javier E. Santos, Ismael Boureima, Daniel O' Malley
EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Yongxin Wang, Meng Cao, Haokun Lin, Mingfei Han, Liang Ma, Jin Jiang, Yuhao Cheng, Xiaodan Liang
KNN-MMD: Cross Domain Wireless Sensing via Local Distribution Alignment
Zijian Zhao, Zhijie Cai, Tingwei Chen, Xiaoyang Li, Hang Li, Qimei Chen, Guangxu Zhu
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
Yizhi Song, Liu He, Zhifei Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Zhe Lin, Brian Price, Scott Cohen, Jianming Zhang, Daniel Aliaga
Learning Chemical Reaction Representation with Reactant-Product Alignment
Kaipeng Zeng, Xianbin Liu, Yu Zhang, Xiaokang Yang, Yaohui Jin, Yanyan Xu
Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps
Xue Xia, Randall Balestriero, Tao Zhang, Lorenz Hurni
Safe to Serve: Aligning Instruction-Tuned Models for Safety and Helpfulness
Avinash Amballa, Durga Sandeep Saluru, Gayathri Akkinapalli, Abhishek Sureddy, Akshay Kumar Sureddy