Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Demystifying the Communication Characteristics for Distributed Transformer Models
Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda
R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation
Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification
Claudio M. V. de Andrade, Washington Cunha, Davi Reis, Adriana Silvina Pagano, Leonardo Rocha, Marcos André Gonçalves