Activation Space

Activation space refers to the high-dimensional space representing the internal states of neural networks, encompassing the activations of neurons at various layers. Current research focuses on understanding and manipulating this space for improved model performance, interpretability, and security, employing techniques like contrastive activation addition, activation space selectable networks, and analysis of activation patterns to detect backdoors or improve generalization. These investigations are crucial for enhancing the reliability, robustness, and explainability of artificial intelligence systems across diverse applications, from natural language processing to image recognition and reinforcement learning.

Papers