Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
Demystifying amortized causal discovery with transformers
Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello
Automatic Domain Adaptation by Transformers in In-Context Learning
Ryuichiro Hataya, Kota Matsui, Masaaki Imaizumi
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert
Transformers represent belief state geometry in their residual stream
Adam S. Shai, Sarah E. Marzen, Lucas Teixeira, Alexander Gietelink Oldenziel, Paul M. Riechers
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie, Haim Sompolinsky
UnitNorm: Rethinking Normalization for Transformers in Time Series
Nan Huang, Christian Kümmerle, Xiang Zhang
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu
Linking In-context Learning in Transformers to Human Episodic Memory
Li Ji-An, Corey Y. Zhou, Marcus K. Benna, Marcelo G. Mattar
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li
Transformers for Image-Goal Navigation
Nikhilanj Pelluri
How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
Subhash Kantamneni, Ziming Liu, Max Tegmark
Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang
Next-token prediction capacity: general upper bounds and a lower bound for transformers
Liam Madden, Curtis Fox, Christos Thrampoulidis
Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci