Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, Felix Hill
End-to-end symbolic regression with transformers
Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton
Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers
Mariano Felice, Shiva Taslimipoor, Paula Buttery
Analysing similarities between legal court documents using natural language processing approaches based on Transformers
Raphael Souza de Oliveira, Erick Giovani Sperandio Nascimento
Data and Physics Driven Learning Models for Fast MRI -- Fundamentals and Methodologies from CNN, GAN to Attention and Transformers
Jiahao Huang, Yingying Fang, Yang Nan, Huanjun Wu, Yinzhe Wu, Zhifan Gao, Yang Li, Zidong Wang, Pietro Lio, Daniel Rueckert, Yonina C. Eldar, Guang Yang
Nowruz at SemEval-2022 Task 7: Tackling Cloze Tests with Transformers and Ordinal Regression
Mohammadmahdi Nouriborji, Omid Rohanian, David Clifton