Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions
Grant Sinha, Krish Parmar, Hilda Azimi, Amy Tai, Yuhao Chen, Alexander Wong, Pengcheng Xi
Mapping Researcher Activity based on Publication Data by means of Transformers
Zineddine Bettouche, Andreas Fischer
Understanding Parameter Sharing in Transformers
Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu
Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang
Learning Sampling Dictionaries for Efficient and Generalizable Robot Motion Planning with Transformers
Jacob J Johnson, Ahmed H Qureshi, Michael Yip
Prediction of Post-Operative Renal and Pulmonary Complications Using Transformers
Reza Shirkavand, Fei Zhang, Heng Huang