AI Alignment
AI alignment focuses on ensuring artificial intelligence systems act in accordance with human values and intentions, addressing potential risks from misaligned goals. Current research emphasizes diverse approaches, including reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO), often applied to large language models (LLMs), to achieve alignment through various methods like reward shaping and preference aggregation. This field is crucial for responsible AI development, impacting both the safety and ethical implications of increasingly capable AI systems across numerous applications.
Papers
September 26, 2023
September 10, 2023
August 3, 2023
July 20, 2023
June 19, 2023
May 30, 2023
May 9, 2023
April 26, 2023
January 16, 2023
January 10, 2023
December 22, 2022
October 4, 2022
June 25, 2022
June 6, 2022
May 9, 2022