Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa, John Lafferty
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
Yeachan Park, Minseok Kim, Yeoneung Kim
Reality Only Happens Once: Single-Path Generalization Bounds for Transformers
Yannick Limmer, Anastasis Kratsios, Xuwei Yang, Raeid Saqur, Blanka Horvath
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert
Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Zizhao Hu, Mohammad Rostami
Your Transformer is Secretly Linear
Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang