Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Quantifying uncertainty in lung cancer segmentation with foundation models applied to mixed-domain datasets
Aneesh Rangnekar, Nishant Nadkarni, Jue Jiang, Harini Veeraraghavan
Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service
Mirza Alim Mutasodirin, Radityo Eko Prasojo, Achmad F. Abka, Hanif Rasyidi
Contextual Clarity: Generating Sentences with Transformer Models using Context-Reverso Data
Ruslan Musaev
Simulating Weighted Automata over Sequences and Trees with Transformers
Michael Rizvi, Maude Lizaire, Clara Lacroce, Guillaume Rabusseau
A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions
Quoc-Vinh Lai-Dang