Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari
A Universal Latent Fingerprint Enhancer Using Transformers
Andre Brasil Vieira Wyzykowski, Anil K. Jain
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan, Yuanzhi Li
Humans in 4D: Reconstructing and Tracking Humans with Transformers
Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, Jitendra Malik
LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction
Jeremiah Milbauer, Annie Louis, Mohammad Javad Hosseini, Alex Fabrikant, Donald Metzler, Tal Schuster
The Impact of Positional Encoding on Length Generalization in Transformers
Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy
Are Large Kernels Better Teachers than Transformers for ConvNets?
Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu
Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input
Shokichi Takakura, Taiji Suzuki
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi
Representation Learning on Hyper-Relational and Numeric Knowledge Graphs with Transformers
Chanyoung Chung, Jaejun Lee, Joyce Jiyoung Whang
Graph Inductive Biases in Transformers without Message Passing
Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K. Dokania, Mark Coates, Philip Torr, Ser-Nam Lim
Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making
Aliyah R. Hsu, Yeshwanth Cherapanamjeri, Briton Park, Tristan Naumann, Anobel Y. Odisho, Bin Yu
How Powerful are Decoder-Only Transformer Neural Models?
Jesse Roberts
Randomized Positional Encodings Boost Length Generalization of Transformers
Anian Ruoss, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Róbert Csordás, Mehdi Bennani, Shane Legg, Joel Veness
Improving Position Encoding of Transformers for Multivariate Time Series Classification
Navid Mohammadi Foumani, Chang Wei Tan, Geoffrey I. Webb, Mahsa Salehi