Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
On Explaining with Attention Matrices
Omar Naim, Nicholas Asher
A Comprehensive Survey of Time Series Forecasting: Architectural Diversity and Open Challenges
Jongseon Kim (1 and 3), Hyungjoon Kim (1 and 4), HyunGi Kim (2), Dongjun Lee (1), Sungroh Yoon (1 and 2) ((1) Interdisciplinary Program in Artificial Intelligence, Seoul National University, (2) Department of Electrical and Computer Engineering, Seoul National University, (3) R&D Department, LG Chem, (4) R&D Department, Samsung SDI)
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI
Fulu Li
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers
Valeria Ruscio, Fabrizio Silvestri
Locating Information in Large Language Models via Random Matrix Theory
Max Staats, Matthias Thamm, Bernd Rosenow
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin, Syed Shakib Sarwar, Mostafa Elhoushi, Sai Qian Zhang, Yuecheng Li, Barbara De Salvo