Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Semantics of Multiword Expressions in Transformer-Based Models: A Survey
Filip Miletić, Sabine Schulte im Walde
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models
Yunhong He, Jianling Qiu, Wei Zhang, Zhengqing Yuan
SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection
Foozhan Ataiefard, Walid Ahmed, Habib Hajimolahoseini, Saina Asani, Farnoosh Javadi, Mohammad Hassanpour, Omar Mohamed Awad, Austin Wen, Kangling Liu, Yang Liu
Empathy and Distress Detection using Ensembles of Transformer Models
Tanmay Chavan, Kshitij Deshpande, Sheetal Sonawane
Structured World Representations in Maze-Solving Transformers
Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung