Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Predicting Chess Puzzle Difficulty with Transformers
Szymon Miłosz, Paweł Kapusta
A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers
Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, C. Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han
Extracting Finite State Machines from Transformers
Rik Adriaensen, Jaron Maene
Core Tokensets for Data-efficient Sequential Training of Transformers
Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian Kersting, Martin Mundt
Towards Robust Spacecraft Trajectory Optimization via Transformers
Yuji Takubo, Tommaso Guffanti, Daniele Gammelli, Marco Pavone, Simone D'Amico
Transformers learn variable-order Markov chains in-context
Ruida Zhou, Chao Tian, Suhas Diggavi
Transformers are Efficient Compilers, Provably
Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon Shaolei Du
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners
Yujin Tang, Lu Qi, Fei Xie, Xiangtai Li, Chao Ma, Ming-Hsuan Yang