Activation Space
Activation space refers to the high-dimensional space representing the internal states of neural networks, encompassing the activations of neurons at various layers. Current research focuses on understanding and manipulating this space for improved model performance, interpretability, and security, employing techniques like contrastive activation addition, activation space selectable networks, and analysis of activation patterns to detect backdoors or improve generalization. These investigations are crucial for enhancing the reliability, robustness, and explainability of artificial intelligence systems across diverse applications, from natural language processing to image recognition and reinforcement learning.
Papers
June 20, 2022
March 25, 2022
March 21, 2022
March 14, 2022
March 1, 2022
February 7, 2022