Alignment Approach
Alignment approaches in artificial intelligence aim to ensure that artificial intelligence models, particularly large language models, behave in ways consistent with human values and intentions. Current research focuses on developing and evaluating various alignment techniques, including reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and methods leveraging in-context learning and prompt engineering, often implemented within specific model architectures like mixture-of-experts. These efforts are crucial for mitigating risks associated with misaligned AI and for building trustworthy and beneficial AI systems across diverse applications, from healthcare to conversational agents.
Papers
January 2, 2025
November 15, 2024
October 26, 2024
October 24, 2024
October 23, 2024
October 20, 2024
October 19, 2024
October 1, 2024
September 22, 2024
September 18, 2024
September 17, 2024
September 11, 2024
September 4, 2024
August 23, 2024
August 1, 2024
July 25, 2024
July 10, 2024
June 27, 2024