Aligned Model
Aligned models aim to create artificial intelligence systems whose behavior and internal representations closely match human preferences and understanding. Current research focuses on improving alignment through iterative self-evaluation, efficient extrapolation techniques from pre-trained models, and bootstrapping methods to reduce reliance on expensive human annotation. These advancements are crucial for enhancing the safety, robustness, and generalizability of AI, particularly in applications involving complex tasks and limited data.
Papers
October 24, 2024
Inference time LLM alignment in single and multidomain preference spectrum
Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Monica Sunkara, Kishaloy Halder, Manuel Mager, Yassine Benajiba
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang, Pengfei Liu, Rui Wang
October 11, 2024
June 17, 2024
April 25, 2024
February 12, 2024
November 8, 2023
October 4, 2023
August 22, 2023