Polysemantic Neuron
Polysemantic neurons, those that respond to multiple unrelated features within a neural network, pose a significant challenge to interpretability. Current research focuses on understanding the origins and consequences of polysemanticity, employing techniques like sparse autoencoders and analyzing second-order effects to disentangle these complex neuronal responses in models such as CLIP and ResNet. This work aims to improve model interpretability and potentially enhance model performance by either promoting monosemanticity (one neuron, one feature) or leveraging the information encoded within polysemantic neurons. The insights gained could lead to more transparent and controllable AI systems.
Papers
November 12, 2024
June 25, 2024
June 6, 2024
April 9, 2024
January 31, 2024
December 17, 2023
December 5, 2023
April 19, 2023