Sparse Attention
Sparse attention techniques aim to improve the efficiency of transformer-based models, particularly large language models (LLMs), by reducing the computational cost of the attention mechanism from quadratic to linear or near-linear complexity. Current research focuses on developing novel algorithms and architectures, such as those employing dynamic sparse attention, hierarchical pruning, and various forms of token selection and merging, to achieve this efficiency while minimizing performance degradation. These advancements are significant because they enable the processing of longer sequences and larger models, impacting both the scalability of LLMs and their applicability to resource-constrained environments.
Papers
August 15, 2023
April 13, 2023
April 10, 2023
March 15, 2023
January 31, 2023
January 15, 2023
December 15, 2022
December 12, 2022
November 14, 2022
November 9, 2022
October 27, 2022
October 21, 2022
October 18, 2022
October 4, 2022
September 30, 2022
September 1, 2022
August 28, 2022
August 9, 2022