Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang, Xiangyu Chang, Mingchen Li, Amit Roy-Chowdhury, Jiasi Chen, Samet Oymak
Comparing Prior and Learned Time Representations in Transformer Models of Timeseries
Natalia Koliou, Tatiana Boura, Stasinos Konstantopoulos, George Meramveliotakis, George Kosmadakis
P$^2$ Law: Scaling Law for Post-Training After Model Pruning
Xiaodong Chen, Yuxuan Hu, Xiaokang Zhang, Yanling Wang, Cuiping Li, Hong Chen, Jing Zhang
Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention
Libo Wang
Building 6G Radio Foundation Models with Transformer Architectures
Ahmed Aboulfotouh, Ashkan Eshaghbeigi, Hatem Abou-Zeid