Sparse Attention
Sparse attention techniques aim to improve the efficiency of transformer-based models, particularly large language models (LLMs), by reducing the computational cost of the attention mechanism from quadratic to linear or near-linear complexity. Current research focuses on developing novel algorithms and architectures, such as those employing dynamic sparse attention, hierarchical pruning, and various forms of token selection and merging, to achieve this efficiency while minimizing performance degradation. These advancements are significant because they enable the processing of longer sequences and larger models, impacting both the scalability of LLMs and their applicability to resource-constrained environments.
Papers
October 19, 2023
October 12, 2023
October 3, 2023
September 22, 2023
September 21, 2023
September 12, 2023
August 22, 2023
August 15, 2023
April 13, 2023
April 10, 2023
March 15, 2023
January 31, 2023
January 15, 2023
December 15, 2022
December 12, 2022
November 14, 2022
November 9, 2022
October 27, 2022