Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana, Kyogo Kawaguchi, Robert P. Dick, Hidenori Tanaka
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers
Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Transformers are Minimax Optimal Nonparametric In-Context Learners
Juno Kim, Tai Nakamaki, Taiji Suzuki
Transformers As Approximations of Solomonoff Induction
Nathan Young, Michael Witbrock
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick, Kevin Y. Li, Eric P. Xing, J. Zico Kolter, Albert Gu
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification
Claudio M. V. de Andrade, Washington Cunha, Davi Reis, Adriana Silvina Pagano, Leonardo Rocha, Marcos André Gonçalves
Transformer Explainer: Interactive Learning of Text-Generative Models
Aeree Cho, Grace C. Kim, Alexander Karpekov, Alec Helbling, Zijie J. Wang, Seongmin Lee, Benjamin Hoover, Duen Horng Chau
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen, Lei Zhao, Difan Zou