Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
An Ensemble Method Based on the Combination of Transformers with Convolutional Neural Networks to Detect Artificially Generated Text
Vijini Liyanage, Davide Buscaldi
Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
Understanding Addition in Transformers
Philip Quirke, Fazl Barez
Sequence Length Independent Norm-Based Generalization Bounds for Transformers
Jacob Trauger, Ambuj Tewari
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann, Simon Schrodi, Jelena Bratulić, Nadine Behrmann, Volker Fischer, Thomas Brox
Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers
Osman Batur İnce, Tanin Zeraati, Semih Yagcioglu, Yadollah Yaghoobzadeh, Erkut Erdem, Aykut Erdem
Transformers for scientific data: a pedagogical review for astronomers
Dimitrios Tanoglidis, Bhuvnesh Jain, Helen Qu
Field-testing items using artificial intelligence: Natural language processing with transformers
Hotaka Maeda
Free-text Keystroke Authentication using Transformers: A Comparative Study of Architectures and Loss Functions
Saleh Momeni, Bagher BabaAli
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Karol Arndt, Oliver Struckmeier, Markus Heinonen, Ville Kyrki, Samuel Kaski