Alignment Tuning
Alignment tuning aims to modify large language models (LLMs) to better align with human preferences and avoid harmful outputs. Current research focuses on improving the robustness and efficiency of alignment techniques, exploring methods like direct preference optimization (DPO) and its variations, as well as novel approaches such as subspace projection and in-context learning to reduce reliance on extensive fine-tuning. These efforts are crucial for mitigating risks associated with LLMs and enhancing their trustworthiness and usability in various applications, including medical diagnosis and other sensitive domains.
Papers
November 16, 2024
November 13, 2024
October 31, 2024
June 10, 2024
June 5, 2024
May 31, 2024
May 22, 2024
March 28, 2024
March 2, 2024
December 4, 2023
August 8, 2023
December 1, 2021