Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning
Junnan Liu, Qianren Mao, Weifeng Jiang, Jianxin Li
Introducing the Large Medical Model: State of the art healthcare cost and risk prediction with transformers trained on patient event sequences
Ricky Sahu, Eric Marriott, Ethan Siegel, David Wagner, Flore Uzan, Troy Yang, Asim Javed
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Georgios Pantazopoulos, Malvina Nikandrou, Alessandro Suglia, Oliver Lemon, Arash Eshghi