LLM Behavior
Research on Large Language Model (LLM) behavior focuses on understanding and controlling their outputs, particularly concerning safety and reliability. Current efforts involve developing methods to interpret LLM decision-making processes, such as through meta-models analyzing internal activations, and improving control mechanisms like activation steering and prompt baking to mitigate harmful or undesirable behaviors. These investigations are crucial for building trustworthy and beneficial LLMs, addressing concerns about replicability in evaluation methodologies and the need for robust techniques to ensure responsible deployment in various applications.
Papers
October 29, 2024
October 23, 2024
October 10, 2024
October 3, 2024
September 30, 2024
September 6, 2024
September 4, 2024
August 22, 2024
August 6, 2024
July 31, 2024
June 27, 2024
June 18, 2024
June 5, 2024
June 4, 2024
May 23, 2024
April 18, 2024
March 29, 2024
March 1, 2024