Model Behavior
Research on large language model (LLM) behavior focuses on understanding and mitigating undesirable outputs, such as toxicity or biases, while improving desirable traits like helpfulness and accuracy. Current efforts investigate methods for post-hoc safety alignment, analyzing the impact of decoding strategies and persona assignment on model responses, and developing techniques to interpret and edit model behavior through targeted interventions or data manipulation. These studies are crucial for building safer and more reliable LLMs, impacting both the development of trustworthy AI systems and the advancement of our understanding of complex model architectures.
Papers
October 3, 2023
September 12, 2023
July 18, 2023
June 30, 2023
April 28, 2023
March 24, 2023
January 25, 2023
January 17, 2023
December 8, 2022
November 10, 2022
August 8, 2022
June 27, 2022