Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix
Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Julia Grabinski, Paramanand Chandramouli, Margret Keuper
Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis
UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition
Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot
Comparison between transformers and convolutional models for fine-grained classification of insects
Rita Pucci, Vincent J. Kalkman, Dan Stowell
TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers
Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Facing Off World Model Backbones: RNNs, Transformers, and S4
Fei Deng, Junyeong Park, Sungjin Ahn