Linear Attention

Linear attention mechanisms aim to improve the efficiency of Transformer models by reducing the computational complexity of the attention operation from quadratic to linear time and space with respect to sequence length. Current research focuses on developing novel linear attention architectures, such as Mamba and Gated Linear Attention, and integrating them into various applications, including language modeling, image generation, and time series forecasting, often through techniques like kernelization or state space modeling. These advancements offer significant potential for scaling up Transformer-based models to handle longer sequences and higher-resolution data, thereby impacting diverse fields requiring efficient processing of large datasets.

Papers

July 13, 2024

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li, Ankit Singh Rawat, Samet Oymak
Context Learning Raw Data Architecture Design Linear Attention Low Rank Attention Generalization Phase Transition

July 6, 2024

Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal
Xiao Siyao, Huang Libing, Zhang Shunsheng
Linear Attention Online Filtering Non Local Multiplicative Noise

July 3, 2024

M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
Agust Egilsson
Linear Attention Biological Sequence Protein Sequence Encoder Microbial Genome Quadratic Attention

June 12, 2024

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, Stan Z. Li
Long Sequence Human Driving Focus Linear Attention Memory Based Long Convolution

June 11, 2024

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin
Speculative Decoding Linear Attention Autoregressive Large Language Model Efficient Large Language Model Linear Attention Model

June 10, 2024

Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim
Linear Attention Linear Transformer Sequence Length Efficient Representation Associative Recall Transformer Baseline

June 6, 2024

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle, Nicolas Obin, Axel Roebel
Language Model Speech Synthesis Linear Attention Decoder Only Transformer Shot Voice Cloning

May 31, 2024

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin, Yuxin Mao, Xuyang Shen, Dong Li, Jing Zhang, Yuchao Dai, Yiran Zhong
Linear Attention Sequential Model Sequence Modeling Task

May 28, 2024

May 27, 2024

May 26, 2024

Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang
Vision Paper Mamba in Mamba Linear Attention Vision Mamba Low Rank Attention Linear Attention Transformer

May 24, 2024

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber, Carmen Amo Alonso, Alexandre Didier, Melanie N. Zeilinger, Antonio Orvieto
Foundation Model Recurrent Neural Network Human Attention Dynamical System State Space Model Qualitative Difference Linear Attention Softmax Attention

May 20, 2024

Asymptotic theory of in-context learning by linear attention
Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan
Context Learning Linear Attention Adaptive Transformer Task Diversity

April 12, 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
Pre Trained Scientific Inference Neural Architecture Linear Attention Context Length Normalization Layer ORCa Behavior

April 11, 2024

HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
Linear Attention Dataset Size Linear RNN Recurrent Dynamic

April 3, 2024

March 31, 2024

On Difficulties of Attention Factorization through Shared Memory
Uladzislau Yorsh, Martin Holeňa, Ondřej Bojar, David Herel
Technical Challenge Attention Mechanism Linear Attention Memory Augmented Transformer Shared Memory Factorized Attention

Linear Attention

Papers

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal

M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Demystify Mamba in Vision: A Linear Attention Perspective

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

Asymptotic theory of in-context learning by linear attention

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

HGRN2: Gated Linear RNNs with State Expansion

Linear Attention Sequence Parallelism

Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers

On Difficulties of Attention Factorization through Shared Memory