Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
Translating Imaging to Genomics: Leveraging Transformers for Predictive Modeling
Aiman Farooq, Deepak Mishra, Santanu Chaudhury
Improving Image De-raining Using Reference-Guided Transformers
Zihao Ye, Jaehoon Cho, Changjae Oh
On Initializing Transformers with Pre-trained Embeddings
Ha Young Kim, Niranjan Balasubramanian, Byungkon Kang
When can transformers compositionally generalize in-context?
Seijin Kobayashi, Simon Schug, Yassir Akram, Florian Redhardt, Johannes von Oswald, Razvan Pascanu, Guillaume Lajoie, João Sacramento
Representing Rule-based Chatbots with Transformers
Dan Friedman, Abhishek Panigrahi, Danqi Chen
Weighted Grouped Query Attention in Transformers
Sai Sena Chinnakonduru, Astarag Mohapatra
Towards Scale-Aware Full Surround Monodepth with Transformers
Yuchen Yang, Xinyi Wang, Dong Li, Lu Tian, Ashish Sirasao, Xun Yang