Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Introduction to Transformers: an NLP Perspective
Tong Xiao, Jingbo Zhu
Transformer Based Model for Predicting Rapid Impact Compaction Outcomes: A Case Study of Utapao International Airport
Sompote Youwai, Sirasak Detcheewa
PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens
Sebastian Stapf, Tobias Bauernfeind, Marco Riboldi
Not all layers are equally as important: Every Layer Counts BERT
Lucas Georges Gabriel Charpentier, David Samuel
Multi-scale Time-stepping of Partial Differential Equations with Transformers
AmirPouya Hemmasian, Amir Barati Farimani
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen, Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao