Activation Space
Activation space refers to the high-dimensional space representing the internal states of neural networks, encompassing the activations of neurons at various layers. Current research focuses on understanding and manipulating this space for improved model performance, interpretability, and security, employing techniques like contrastive activation addition, activation space selectable networks, and analysis of activation patterns to detect backdoors or improve generalization. These investigations are crucial for enhancing the reliability, robustness, and explainability of artificial intelligence systems across diverse applications, from natural language processing to image recognition and reinforcement learning.
Papers
October 3, 2024
August 15, 2024
July 21, 2024
May 27, 2024
May 18, 2024
December 13, 2023
December 9, 2023
November 27, 2023
November 16, 2023
October 7, 2023
October 5, 2023
September 6, 2023
June 26, 2023
April 2, 2023
March 25, 2023
March 13, 2023
March 1, 2023
January 22, 2023