Language Model Behavior
Research on language model behavior aims to understand and control the diverse outputs and capabilities of large language models (LLMs), focusing on issues like reliability, bias, and alignment with human values. Current investigations explore techniques such as persona assignment, steering vectors, and knowledge circuit analysis to probe internal model mechanisms and improve control over LLM behavior, often using transformer architectures. This work is crucial for mitigating risks associated with LLMs, such as misinformation and harmful outputs, and for developing more robust and beneficial AI systems across various applications.
Papers
November 15, 2024
July 17, 2024
July 2, 2024
June 12, 2024
May 28, 2024
April 24, 2024
March 28, 2024
February 5, 2024
July 31, 2023
July 18, 2023
June 23, 2023
May 26, 2023
March 20, 2023
December 19, 2022