Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Transformers for scientific data: a pedagogical review for astronomers
Dimitrios Tanoglidis, Bhuvnesh Jain, Helen Qu
Deep learning based on Transformer architecture for power system short-term voltage stability assessment with class imbalance
Yang Li, Jiting Cao, Yan Xu, Lipeng Zhu, Zhao Yang Dong
Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention
Yichuan Deng, Zhao Song, Tianyi Zhou
The Efficacy of Transformer-based Adversarial Attacks in Security Domains
Kunyang Li, Kyle Domico, Jean-Charles Noirot Ferrand, Patrick McDaniel
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei