Linear Attention
Linear attention mechanisms aim to improve the efficiency of Transformer models by reducing the computational complexity of the attention operation from quadratic to linear time and space with respect to sequence length. Current research focuses on developing novel linear attention architectures, such as Mamba and Gated Linear Attention, and integrating them into various applications, including language modeling, image generation, and time series forecasting, often through techniques like kernelization or state space modeling. These advancements offer significant potential for scaling up Transformer-based models to handle longer sequences and higher-resolution data, thereby impacting diverse fields requiring efficient processing of large datasets.
Papers
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard Hartley, Yiran Zhong
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong