Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
Ziyang Wu, Tianjiao Ding, Yifu Lu, Druv Pai, Jingyuan Zhang, Weida Wang, Yaodong Yu, Yi Ma, Benjamin D. Haeffele
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen, Jiayan Huo, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song