Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
1024m at SMM4H 2024: Tasks 3, 5 & 6 -- Ensembles of Transformers and Large Language Models for Medical Text Classification
Ram Mohan Rao Kadiyala, M.V.P. Chandra Sekhara Rao
Learning to Generate and Evaluate Fact-checking Explanations with Transformers
Darius Feher, Abdullah Khered, Hao Zhang, Riza Batista-Navarro, Viktor Schlegel
All You Need is an Improving Column: Enhancing Column Generation for Parallel Machine Scheduling via Transformers
Amira Hijazi, Osman Ozaltin, Reha Uzsoy
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo, Heeyoul Choi
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions
Matthew J. Vowels, Mathieu Rochat, Sina Akbari
Provable In-context Learning for Mixture of Linear Regressions using Transformers
Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study
Robert Spencer, Surangika Ranathunga, Mikael Boulic, Andries van Heerden, Teo Susnjak
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu, Ruida Zhou, Cong Shen, Jing Yang
Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers
Patrik Zavoral, Dušan Variš, Ondřej Bojar
Learning Graph Quantized Tokenizers for Transformers
Limei Wang, Kaveh Hassani, Si Zhang, Dongqi Fu, Baichuan Yuan, Weilin Cong, Zhigang Hua, Hao Wu, Ning Yao, Bo Long
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
Clayton Souza Leite, Henry Mauranen, Aziza Zhanabatyrova, Yu Xiao
360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers
Jack Hilliard, Adrian Hilton, Jean-Yves Guillemaut
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers
Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Ang Li, Dong Yu
On the Training Convergence of Transformers for In-Context Classification
Wei Shen, Ruida Zhou, Jing Yang, Cong Shen
Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers
Davide Celestini, Amirhossein Afsharrad, Daniele Gammelli, Tommaso Guffanti, Gioele Zardini, Sanjay Lall, Elisa Capello, Simone D'Amico, Marco Pavone
On Rank-Dependent Generalisation Error Bounds for Transformers
Lan V. Truong
How Transformers Implement Induction Heads: Approximation and Optimization Analysis
Mingze Wang, Ruoxi Yu, Weinan E, Lei Wu
Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems
Anis Redjdal, Luis Pinto, Michel Desmarais