Salient Neuron

Salient neuron research focuses on identifying and understanding the specific neurons within large language models (LLMs) and other deep neural networks that are most crucial for particular tasks or features. Current research employs techniques like sparse probing and linear classifiers to locate these neurons, often within intermediate layers of LLMs, and investigates how their activation patterns relate to input features and model performance. This work aims to improve model interpretability, efficiency (by focusing on essential neurons), and potentially enhance model design and training through a better understanding of internal representations. The findings contribute to a deeper understanding of how these complex models process information and could lead to more efficient and explainable AI systems.

Papers

November 27, 2023

SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification
Difan Jiao, Yilun Liu, Zhenwei Tang, Daniel Matter, Jürgen Pfeffer, Ashton Anderson
Text Classification Random Sparsification Spin Direction Salient Neuron

October 22, 2023

Universal representation by Boltzmann machines with Regularised Axons
Przemysław R. Grzybowski, Antoni Jankiewicz, Eloy Piñol, David Cirauqui, Dorota H. Grzybowska, Paweł M. Petrykowski, Miguel Ángel García-March, Maciej Lewenstein, Gorka Muñoz-Gil, Alejandro Pozas-Kerstjens
Neural Network Regularization Model Boltzmann Machine Universal Representation Hidden Pattern Salient Neuron

May 2, 2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas
Large Language Model Case Study Linear Probing Interpretable Feature Individual Neuron Video Haystack Contextual Feature Salient Neuron

October 24, 2022

Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks
Biao Zhang, Shuqin Zhang
Deep Learning Model Model Fusion Heterogeneous Autoencoder Salient Neuron

June 27, 2022

Discovering Salient Neurons in Deep NLP Models
Nadir Durrani, Fahim Dalvi, Hassan Sajjad
Linguistic Information Linguistic Knowledge Salient Neuron

Salient Neuron

Papers

SPIN: Sparsifying and Integrating Internal Neurons in Large Language Models for Text Classification

Universal representation by Boltzmann machines with Regularised Axons

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Investigating Neuron Disturbing in Fusing Heterogeneous Neural Networks

Discovering Salient Neurons in Deep NLP Models