Targeted Activation Penalty
Targeted activation penalty (TAP) research focuses on improving the robustness and interpretability of neural networks by manipulating neuron activations. Current work investigates how activation scaling, dropout, and other techniques can mitigate issues like spurious signal reliance, massive activations (excessively large activation values in specific dimensions), and task drift in large language models (LLMs) and other architectures, including convolutional neural networks (CNNs) and graph neural networks (GNNs). These efforts aim to enhance model generalization, safety, and explainability, leading to more reliable and trustworthy AI systems across various applications. The ultimate goal is to develop more robust and interpretable models by better understanding and controlling the internal workings of neural networks.
Papers
Relating Regularization and Generalization through the Intrinsic Dimension of Activations
Bradley C. A. Brown, Jordan Juravsky, Anthony L. Caterini, Gabriel Loaiza-Ganem
ActMAD: Activation Matching to Align Distributions for Test-Time-Training
Muhammad Jehanzeb Mirza, Pol Jané Soneira, Wei Lin, Mateusz Kozinski, Horst Possegger, Horst Bischof