Quadratic Attention
Quadratic attention, a computational bottleneck in many transformer-based models, is a focus of current research aiming to improve efficiency and scalability in various applications. Researchers are exploring alternative architectures, such as linear attention mechanisms and state space models (like Mamba), to approximate or replace quadratic attention while maintaining performance. This work is driven by the need to handle longer sequences and larger datasets in tasks ranging from language modeling and audio classification to autonomous driving and genomic analysis, ultimately impacting the feasibility and performance of large-scale machine learning systems.
Papers
August 21, 2024
July 3, 2024
June 24, 2024
June 14, 2024
June 5, 2024
April 22, 2024
May 26, 2023
June 1, 2022