Targeted Activation Penalty

Targeted activation penalty (TAP) research focuses on improving the robustness and interpretability of neural networks by manipulating neuron activations. Current work investigates how activation scaling, dropout, and other techniques can mitigate issues like spurious signal reliance, massive activations (excessively large activation values in specific dimensions), and task drift in large language models (LLMs) and other architectures, including convolutional neural networks (CNNs) and graph neural networks (GNNs). These efforts aim to enhance model generalization, safety, and explainability, leading to more reliable and trustworthy AI systems across various applications. The ultimate goal is to develop more robust and interpretable models by better understanding and controlling the internal workings of neural networks.

Papers

June 18, 2024

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs
Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu
Large Language Model Medical LLM Theoretical Analysis Targeted Activation Penalty Activation Maximization Semantic Association Floor Lift Dynamic Activation

June 17, 2024

June 14, 2024

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning
Jiaqi Li, Yixuan Tang, Yi Yang
Large Language Model Instruction Tuning Targeted Activation Penalty Robust Fine Tuning Uncertainty Decomposition

June 7, 2024

Leveraging Activations for Superpixel Explanations
Ahcène Boubekki, Samuel G. Fadel, Sebastian Mair
Targeted Activation Penalty Adjacent Superpixels Saliency Method Superpixel Method

June 2, 2024

Are you still on track!? Catching LLM Task Drift with Activations
Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, Andrew Paverd
Targeted Activation Penalty Shared Track LLM Adaptation Injection Attack Inspection Task

May 29, 2024

Semiring Activation in Neural Networks
Bart M. N. Smets, Peter D. Donker, Jim W. Portegies, Remco Duits
Neural Network Neural Operator Direct Convolution Targeted Activation Penalty

May 15, 2024

Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen
Large Language Model Internal Representation Targeted Activation Penalty Inference Time Large Language Model Alignment Input Representation Spectral Filtering

May 8, 2024

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments
Jiarong Yang, Yuan Liu
Label Distribution Targeted Activation Penalty Machine Learning Framework Split Federated Learning Logit Adjustment Label Variation

May 2, 2024

Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations
Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling
Neural Network Physic Informed Neural Network Targeted Activation Penalty Wide Neural Network Residual Stream Large Network

April 23, 2024

How to use and interpret activation patching
Stefan Heimersheim, Neel Nanda
Evidence Piece Mechanistic Interpretability Targeted Activation Penalty Best Practice Large Scale Circuit

March 28, 2024

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey
Neural Network Computer Vision DCU Insight AQ Neural Field New Initialization Targeted Activation Penalty Signal Representation

February 27, 2024

Massive Activations in Large Language Models
Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu
Vision Transformer Self Attention Targeted Activation Penalty

February 14, 2024

February 8, 2024

January 19, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You
Long Sequence Input Sequence Targeted Activation Penalty Chunk Wise Large Deep Learning Model

January 15, 2024

Activations and Gradients Compression for Model-Parallel Training
Mikhail Rudakov, Aleksandr Beznosikov, Yaroslav Kholodov, Alexander Gasnikov
Neural Network Targeted Activation Penalty Gradient Compression Model Convergence Top K Sparsification Model Parallel Experimental Batch Correction Method

Targeted Activation Penalty

Papers

MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs

Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

On GNN explanability with activation rules

Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning

Leveraging Activations for Superpixel Explanations

Are you still on track!? Catching LLM Task Drift with Activations

Semiring Activation in Neural Networks

Spectral Editing of Activations for Large Language Model Alignment

SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments

Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations

How to use and interpret activation patching

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields

Massive Activations in Large Language Models

Three Decades of Activations: A Comprehensive Survey of 400 Activation Functions for Neural Networks

Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints

Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Fixed width treelike neural networks capacity analysis -- generic activations

A Sampling Theory Perspective on Activations for Implicit Neural Representations

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Activations and Gradients Compression for Model-Parallel Training