Alignment Performance
Alignment performance in large language models (LLMs) and other AI systems focuses on ensuring model outputs align with human intentions and values, encompassing safety, fairness, and adherence to social norms. Current research emphasizes improving alignment through techniques like reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and in-context learning (ICL), often employing novel model architectures and algorithms to enhance efficiency and robustness. These advancements are crucial for responsible AI development, mitigating risks associated with harmful outputs and enabling safer and more beneficial deployment of LLMs across various applications.
Papers
February 27, 2024
February 14, 2024
December 8, 2023
October 14, 2023
October 6, 2023
October 1, 2023
September 4, 2023
August 10, 2023
March 5, 2023
February 15, 2023
August 18, 2022
May 30, 2022
May 17, 2022
May 5, 2022
April 10, 2022