Attention Entropy
Attention entropy, a measure of the uncertainty or randomness in the attention weights of neural networks, is emerging as a key factor influencing model performance and training stability across various deep learning tasks. Current research focuses on leveraging attention entropy to improve model efficiency (e.g., in image super-resolution and large language models), enhance explainability (e.g., by correlating attention patterns with human gaze data), and mitigate biases (e.g., by regularizing attention to prevent overfitting on specific terms). Understanding and controlling attention entropy offers significant potential for improving the robustness, interpretability, and generalization capabilities of deep learning models in diverse applications, from natural language processing to computer vision.
Papers
Active Visual Exploration Based on Attention-Map Entropy
Adam Pardyl, Grzegorz Rypeść, Grzegorz Kurzejamski, Bartosz Zieliński, Tomasz Trzciński
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, Josh Susskind