Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
IG-CFAT: An Improved GAN-Based Framework for Effectively Exploiting Transformers in Real-World Image Super-Resolution
Alireza Aghelan, Ali Amiryan, Abolfazl Zarghani, Modjtaba Rouhani
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher
Federating to Grow Transformers with Constrained Resources without Model Sharing
Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
Satwik Bhattamishra, Michael Hahn, Phil Blunsom, Varun Kanade
Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers
Zhuolin Fu
Transformers meet Neural Algorithmic Reasoners
Wilfried Bounsi, Borja Ibarz, Andrew Dudzik, Jessica B. Hamrick, Larisa Markeeva, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković
Generative Inverse Design of Crystal Structures via Diffusion Models with Transformers
Izumi Takahara, Kiyou Shibata, Teruyasu Mizoguchi
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
Emil Biju, Anirudh Sriram, Mert Pilanci
ReduceFormer: Attention with Tensor Reduction by Summation
John Yang, Le An, Su Inn Park
GridPE: Unifying Positional Encoding in Transformers with a Grid Cell-Inspired Framework
Boyang Li, Yulin Wu, Nuoxian Huang, Wenjia Zhang
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee