Activation Space

Activation space refers to the high-dimensional space representing the internal states of neural networks, encompassing the activations of neurons at various layers. Current research focuses on understanding and manipulating this space for improved model performance, interpretability, and security, employing techniques like contrastive activation addition, activation space selectable networks, and analysis of activation patterns to detect backdoors or improve generalization. These investigations are crucial for enhancing the reliability, robustness, and explainability of artificial intelligence systems across diverse applications, from natural language processing to image recognition and reinforcement learning.

Papers

January 22, 2023

Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
Transformer Megatron Decepticons Latent Space Inherent Interpretability Comprehensive Survey Attention Layer Self Attention Layer Activation Space Feed Forward Layer

July 18, 2022

Interpolation, extrapolation, and local generalization in common neural networks
Laurent Bonnasse-Gahot
Neural Network Neural Network Architecture Interpolation Regime Extrapolation Framework Activation Space Intrinsic Coordinate

June 20, 2022

Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts
Alex Bäuerle, Daniel Jönsson, Timo Ropinski
Visual Explanation Concept Learning Activation Pattern Concept Discovery Activation Space Neural Network Layer Neural Network Activation

March 25, 2022

Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation
Josiah Park, Stephan Wojtowytsch
Qualitative Analysis Activation Space Neural Network Approximation Polynomial Activation C Program Elementary Proof

March 21, 2022

Origami in N dimensions: How feed-forward networks manufacture linear separability
Christian Keup, Moritz Helias
Neural Network Deep Network Deep Architecture Linear Separability Activation Space Separable Representation Rigid Origami

March 14, 2022

Soft-margin classification of object manifolds
Uri Cohen, Haim Sompolinsky
Object Recognition Margin Classifier Data Manifold Activation Space Soft Margin

March 1, 2022

Exploring Wilderness Characteristics Using Explainable Machine Learning in Satellite Imagery
Timo T. Stomberg, Taylor Stone, Johannes Leonhardt, Immanuel Weber, Ribana Roscher
Satellite Image Satellite Imagery Explainable Machine Learning Activation Space Wilderness Search Anthropogenic Microhabitat Feature

February 7, 2022

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde, Maximilian Dreyer, Leander Weber, Moritz Weckbecker, Christopher J. Anders, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Latent Space Latent Representation Inverse Divergence Concept Activation Vector Activation Space Concept Detection

December 13, 2021

gACSON software for automated segmentation and morphology analyses of myelinated axons in 3D electron microscopy
Andrea Behanova, Ali Abdollahzadeh, Ilya Belevich, Eija Jokitalo, Alejandra Sierra, Jussi Tohka
Activation Space Brain Tissue

Activation Space

Papers

Interpretability in Activation Space Analysis of Transformers: A Focused Survey

Interpolation, extrapolation, and local generalization in common neural networks

Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts

Qualitative neural network approximation over R and C: Elementary proofs for analytic and polynomial activation

Origami in N dimensions: How feed-forward networks manufacture linear separability

Soft-margin classification of object manifolds

Exploring Wilderness Characteristics Using Explainable Machine Learning in Satellite Imagery

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

gACSON software for automated segmentation and morphology analyses of myelinated axons in 3D electron microscopy