Self Alignment
Self-alignment in large language models (LLMs) focuses on improving model behavior and aligning it with desired characteristics, such as adherence to cultural values or factual accuracy, without extensive human supervision. Current research explores various methods, including iterative self-enhancement paradigms, meta-rewarding techniques where the model judges its own responses, and resolving internal preference contradictions within the model. These advancements aim to reduce the reliance on costly human annotation and improve the reliability and safety of LLMs, impacting both the development of more robust AI systems and their practical applications across diverse fields.
Papers
November 26, 2024
November 13, 2024
November 6, 2024
October 31, 2024
October 18, 2024
October 12, 2024
August 29, 2024
August 15, 2024
July 28, 2024
June 13, 2024
June 5, 2024
May 12, 2024
May 2, 2024
May 1, 2024
February 28, 2024
February 23, 2024
February 14, 2024
February 12, 2024
February 8, 2024