Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers
Patrik Zavoral, Dušan Variš, Ondřej Bojar
Learning Graph Quantized Tokenizers for Transformers
Limei Wang, Kaveh Hassani, Si Zhang, Dongqi Fu, Baichuan Yuan, Weilin Cong, Zhigang Hua, Hao Wu, Ning Yao, Bo Long
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Souza Leite, Henry Mauranen, Aziza Zhanabatyrova, Yu Xiao
360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers
Jack Hilliard, Adrian Hilton, Jean-Yves Guillemaut
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu
On the Training Convergence of Transformers for In-Context Classification
Wei Shen, Ruida Zhou, Jing Yang, Cong Shen
Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers
Davide Celestini, Amirhossein Afsharrad, Daniele Gammelli, Tommaso Guffanti, Gioele Zardini, Sanjay Lall, Elisa Capello, Simone D'Amico, Marco Pavone
On Rank-Dependent Generalisation Error Bounds for Transformers
Lan V. Truong
How Transformers Implement Induction Heads: Approximation and Optimization Analysis
Mingze Wang, Ruoxi Yu, Weinan E, Lei Wu
Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems
Anis Redjdal, Luis Pinto, Michel Desmarais
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz, Paweł Kapusta
A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers
Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han