Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Pre-training Transformers for Molecular Property Prediction Using Reaction Prediction
Johan Broberg, Maria Bånkestad, Erik Ylipää
Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation
Samuel Cognolato, Alberto Testolin
The Role of Complex NLP in Transformers for Text Ranking?
David Rau, Jaap Kamps
Transformers are Adaptable Task Planners
Vidhi Jain, Yixin Lin, Eric Undersander, Yonatan Bisk, Akshara Rai
Multi-Label Retinal Disease Classification using Transformers
M. A. Rodriguez, H. AlMarzouqi, P. Liatsis
OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers
Jialun Pei, Tianyang Cheng, Deng-Ping Fan, He Tang, Chuanbo Chen, Luc Van Gool
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung, Jun Gao, Xiaohui Zeng, Sanja Fidler
GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation
Mark Endo, Kathleen L. Poston, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Ehsan Adeli
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu