Pre Trained Transformer
Pre-trained transformer models are foundational neural networks achieving state-of-the-art results across diverse tasks by leveraging massive datasets for initial training, followed by fine-tuning for specific applications. Current research emphasizes improving efficiency, including parameter reduction techniques like low-rank factorization and early exit strategies, and exploring effective transfer learning methods across modalities (e.g., image to video, text to speech). This work is significant because it enables the application of powerful transformer architectures to resource-constrained settings and expands their utility beyond their original training domains, impacting fields from natural language processing and computer vision to medical image analysis and even military strategy.
Papers
Quantifying uncertainty in lung cancer segmentation with foundation models applied to mixed-domain datasets
Aneesh Rangnekar, Nishant Nadkarni, Jue Jiang, Harini Veeraraghavan
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi, Yuxiang Chen, Kang Zhao, Kaijun Zheng, Jianfei Chen, Jun Zhu