Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff
AirFormer: Predicting Nationwide Air Quality in China with Transformers
Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, Roger Zimmermann
On Designing Light-Weight Object Trackers through Network Pruning: Use CNNs or Transformers?
Saksham Aggarwal, Taneesh Gupta, Pawan Kumar Sahu, Arnav Chavan, Rishabh Tiwari, Dilip K. Prasad, Deepak K. Gupta
MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction
Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang
TorchScale: Transformers at Scale
Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Completing point cloud from few points by Wasserstein GAN and Transformers
Xianfeng Wu, Jinhui Qian, Qing Wei, Xianzu Wu, Xinyi Liu, Luxin Hu, Yanli Gong, Zhongyuan Lai, Libing Wu
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh
Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution
Aiwei Liu, Honghai Yu, Xuming Hu, Shu'ang Li, Li Lin, Fukun Ma, Yawen Yang, Lijie Wen