Alignment Model
Alignment models aim to harmonize the outputs of large language models (LLMs) with human intentions and preferences, addressing issues like bias, safety, and reliability. Current research focuses on developing efficient alignment techniques, including Bayesian persuasion, preference learning, and multi-LLM collaboration, often employing novel architectures like focused-view fusion networks or incorporating external knowledge sources like diagnostic rules. These advancements are crucial for improving the trustworthiness and beneficial use of LLMs across diverse applications, from medical diagnosis to image and video retrieval, and enhancing their robustness against adversarial attacks.
19papers
Papers
December 5, 2024
September 30, 2024
March 7, 2024
February 7, 2024
February 2, 2024