Multi Head Self Attention
Multi-head self-attention (MHSA) is a mechanism within transformer-based models designed to efficiently capture long-range dependencies in sequential data, improving performance in various tasks. Current research focuses on improving MHSA's efficiency and effectiveness, particularly for long sequences, through techniques like low-rank approximations, sparse attention, and adaptive budget allocation within models such as Swin Transformers, Conformer networks, and various Vision Transformers. These advancements are impacting diverse fields, including speech recognition, image restoration, medical image analysis, and natural language processing, by enabling faster and more accurate processing of complex data. The ongoing refinement of MHSA is crucial for scaling up deep learning models and broadening their applicability to resource-constrained environments.
Papers
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention
Yiming Ma, Victor Sanchez, Soodeh Nikan, Devesh Upadhyay, Bhushan Atote, Tanaya Guha