Alignment Problem
The alignment problem in artificial intelligence focuses on ensuring that advanced models, particularly large language models (LLMs) and diffusion models, behave in ways consistent with human values and intentions. Current research emphasizes improving reward models, developing more robust evaluation metrics (moving beyond deterministic point estimates to probabilistic frameworks), and exploring various alignment techniques, including preference optimization, knowledge distillation, and contrastive learning, often applied within fine-tuning or training-free frameworks. Successfully addressing the alignment problem is crucial for the safe and ethical deployment of powerful AI systems across diverse applications, ranging from healthcare and drug discovery to robotics and social media moderation.
Papers
Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior?
Aryaman Chobey, Oliver Smith, Anzi Wang, Grusha Prasad
DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image
Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, Angela Dai
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue
Aron Molnar, Jaap Jumelet, Mario Giulianelli, Arabella Sinclair
Adapting pretrained speech model for Mandarin lyrics transcription and alignment
Jun-You Wang, Chon-In Leong, Yu-Chen Lin, Li Su, Jyh-Shing Roger Jang
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin, Wei Ding, Jia Liu
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Tianhao Hu, Peixin Cao, Nan Du, Xiaolong Li