Transformer Architecture
Transformer architectures are a dominant deep learning paradigm, primarily known for their self-attention mechanism enabling efficient processing of sequential data like text and time series. Current research focuses on addressing the quadratic time complexity of self-attention through alternative architectures (e.g., state space models like Mamba) and optimized algorithms (e.g., local attention, quantized attention), as well as exploring the application of transformers to diverse domains including computer vision, robotics, and blockchain technology. These efforts aim to improve the efficiency, scalability, and interpretability of transformers, leading to broader applicability and enhanced performance across numerous fields.
Papers
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami
A low latency attention module for streaming self-supervised speech representation learning
Jianbo Ma, Siqi Pan, Deepak Chandran, Andrea Fanelli, Richard Cartwright
A Study on ReLU and Softmax in Transformer
Kai Shen, Junliang Guo, Xu Tan, Siliang Tang, Rui Wang, Jiang Bian
Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation
Lorenzo Lupo, Marco Dinarelli, Laurent Besacier
One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data
Simone Luetto, Fabrizio Garuti, Enver Sangineto, Lorenzo Forni, Rita Cucchiara