Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Transferring a molecular foundation model for polymer property predictions
Pei Zhang, Logan Kearney, Debsindhu Bhowmik, Zachary Fox, Amit K. Naskar, John Gounley
Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks
Solveig Helland, Elena Gavagnin, Alexandre de Spindler
Understanding Code Semantics: An Evaluation of Transformer Models in Summarization
Debanjan Mondal, Abhilasha Lodha, Ankita Sahoo, Beena Kumari
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
Szymon Antoniak, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Marek Cygan, Sebastian Jaszczur
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks
Sunit Bhattacharya, Ondrej Bojar