Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
GADformer: A Transparent Transformer Model for Group Anomaly Detection on Trajectories
Andreas Lohrer, Darpan Malik, Claudius Zelenka, Peer Kröger
Transformers and Ensemble methods: A solution for Hate Speech Detection in Arabic languages
Angel Felipe Magnossão de Paula, Imene Bensalem, Paolo Rosso, Wajdi Zaghouani
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection
Nikhil J. Dhinagar, Sophia I. Thomopoulos, Emily Laltoo, Paul M. Thompson
Input-length-shortening and text generation via attention values
Neşet Özkan Tan, Alex Yuxuan Peng, Joshua Bensemann, Qiming Bao, Tim Hartill, Mark Gahegan, Michael Witbrock