Transformer Model
Transformer models are a class of neural networks built upon an attention mechanism, enabling them to process sequential data like text and time series with remarkable effectiveness. Current research focuses on improving training stability (e.g., mitigating loss spikes), enhancing expressiveness through novel attention mechanisms and embedding techniques, and optimizing performance for various applications by exploring different architectures (e.g., hybrid Transformer-Mamba models) and parallelization strategies. This work is significant due to the widespread adoption of transformers in diverse fields, from natural language processing and computer vision to scientific computing and engineering, driving advancements in both theoretical understanding and practical applications.
Papers
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li, Junyu Chen, Yucheng Tang, Ce Wang, Bennett A. Landman, S. Kevin Zhou
Modeling Image Composition for Complex Scene Generation
Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, Dacheng Tao
BayesFormer: Transformer with Uncertainty Estimation
Karthik Abinav Sankararaman, Sinong Wang, Han Fang
Romantic-Computing
Elizabeth Horishny
A comparative study between vision transformers and CNNs in digital pathology
Luca Deininger, Bernhard Stimpel, Anil Yuce, Samaneh Abbasi-Sureshjani, Simon Schönenberger, Paolo Ocampo, Konstanty Korski, Fabien Gaire
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, Furu Wei