Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Multi-Human Mesh Recovery with Transformers
Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya
The Impact of LoRA on the Emergence of Clusters in Transformers
Hugo Koubbi, Matthieu Boussard, Louis Hernandez
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
Clement Neo, Shay B. Cohen, Fazl Barez