Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy
Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning
Bochen Yang, Kaizhong Deng, Christopher J Peters, George Mylonas, Daniel S. Elson
$k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers
Akhilesh Kakolu Ramarao, Kevin Tang, Dinah Baer-Henney
Interpretable Image Classification with Adaptive Prototype-based Vision Transformers
Chiyu Ma, Jon Donnelly, Wenjun Liu, Soroush Vosoughi, Cynthia Rudin, Chaofan Chen