Self Attention Layer

Self-attention layers are a core component of Transformer networks, enabling these models to process sequential data by weighting the importance of different elements within the sequence. Current research focuses on improving the efficiency and theoretical understanding of self-attention, including exploring its optimization dynamics, analyzing its role in generalization and hallucination in large language models, and developing alternative attention mechanisms like Locality Sensitive Hashing or polynomial-based approaches to reduce computational cost. These advancements are driving improvements in various applications, from image segmentation and super-resolution to natural language processing and visual place recognition, by enhancing model performance and scalability.

Papers

June 28, 2022

ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi, Jugal Kalita
Self Attention Self Attention Layer Self Attention Module Patch Attention

June 24, 2022

Excavating RoI Attention for Underwater Object Detection
Xutao Liang, Pinhao Song
Attention Module Self Attention Layer Underwater Object Detection Direct Roi Prediction

June 7, 2022

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi
Transformer Megatron Decepticons Integral Role Natural Gradient Gradient Norm Self Attention Layer Propagation Environment Rank Collapse Theoretical Consideration Effective Depth Up Scaling

April 1, 2022

March 15, 2022

SATS: Self-Attention Transfer for Continual Semantic Segmentation
Yiqiao Qiu, Yixing Shen, Zhuohao Sun, Yanchong Zheng, Xiaobin Chang, Weishi Zheng, Ruixuan Wang
Convolutional Neural Network Transformer Based Self Attention Self Attention Layer Continual Semantic Segmentation

March 14, 2022

Simplicial Attention Neural Networks
L. Giusti, C. Battiloro, P. Di Lorenzo, S. Sardellitti, S. Barbarossa
Self Attention Neural Network Architecture Self Attention Layer Topological Signal Simplicial Attention

March 3, 2022

Multi-Tailed Vision Transformer for Efficient Inference
Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu
Vision Transformer Efficient Inference Self Attention Layer Transformer Encoder Transformer Encoders

February 24, 2022

TrimBERT: Tailoring BERT for Trade-offs
Sharath Nittur Sridhar, Anthony Sarah, Sairam Sundaresan
Natural Language Processing Ticket BERT Self Attention Layer BERT Base

December 2, 2021

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips
Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato
Action Recognition Temporal Attention Self Attention Layer Temporal Attention Module CLIP Enhanced Blockwise Classification

November 30, 2021

Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
Vision Transformer Self Attention Self Attention Layer Spatial Token Focal Transformer