Quadratic Attention

Quadratic attention, a computational bottleneck in many transformer-based models, is a focus of current research aiming to improve efficiency and scalability in various applications. Researchers are exploring alternative architectures, such as linear attention mechanisms and state space models (like Mamba), to approximate or replace quadratic attention while maintaining performance. This work is driven by the need to handle longer sequences and larger datasets in tasks ranging from language modeling and audio classification to autonomous driving and genomic analysis, ultimately impacting the feasibility and performance of large-scale machine learning systems.

Papers