Activation Vector
Activation vectors, representing the internal state of neural networks at various layers, are crucial for understanding and manipulating model behavior. Current research focuses on using these vectors for model interpretability, steering model outputs (e.g., through scaling or tuning), and improving model performance in open-set recognition scenarios. These investigations often leverage transformer architectures and employ techniques like gradient-based optimization and cosine similarity-based loss functions. The ability to effectively analyze and manipulate activation vectors holds significant promise for enhancing model transparency, safety, and overall performance in various applications.
Papers
November 13, 2024
October 9, 2024
October 7, 2024
September 11, 2024
May 27, 2024
May 7, 2024
March 18, 2024
November 21, 2022