Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Learning Spectral Methods by Transformers
Yihan He, Yuan Cao, Hong-Yu Chen, Dennis Wu, Jianqing Fan, Han Liu
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
Bohang Sun, Pietro LiĆ²
TabTreeFormer: Tree Augmented Tabular Data Generation using Transformers
Jiayu Li, Bingyin Zhao, Zilong Zhao, Kevin Yee, Uzair Javaid, Yingjie Lao, Biplab Sikdar
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn, Sari Sadiya, Jorg Schlotterer, Christin Seifert, Gemma Roig
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
Zhenyu Guo, Wenguang Chen
VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
Stanislav Kirdey
Resource-Efficient Transformer Architecture: Optimizing Memory and Execution Time for Real-Time Applications
Krisvarish V, Priyadarshini T, K P Abhishek Sri Saai, Vaidehi Vijayakumar
Unified Local and Global Attention Interaction Modeling for Vision Transformers
Tan Nguyen, Coy D. Heldermon, Corey Toler-Franklin
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai, Nik Bear Brown
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
Dong Hoon Lee, Seunghoon Hong
Automated Image Captioning with CNNs and Transformers
Joshua Adrian Cahyono, Jeremy Nathan Jusuf
Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective
Yuchen Fang, Yuxuan Liang, Bo Hui, Zezhi Shao, Liwei Deng, Xu Liu, Xinke Jiang, Kai Zheng