Memory Efficient Attention
Memory-efficient attention mechanisms aim to reduce the computational and memory costs associated with the self-attention operation in transformer-based models, particularly crucial for processing long sequences. Current research focuses on optimizing attention calculations through techniques like in-storage computation, modified softmax functions with constant time complexity, and strategies that leverage locality or tree-structured attention for improved efficiency. These advancements are vital for deploying large language models and other attention-based architectures on resource-constrained devices and for enabling the processing of significantly longer input sequences, thereby expanding the scope of applications in various fields.
Papers
January 2, 2025
October 21, 2024
September 8, 2024
April 8, 2024
March 30, 2024
February 29, 2024
October 5, 2023
June 12, 2023
January 19, 2023
October 18, 2022
September 18, 2022