Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Shakeeb Murtaza, Marco Pedersoli, Aydin Sarraf, Eric Granger
FairPFN: Transformers Can do Counterfactual Fairness
Jake Robertson, Noah Hollmann, Noor Awad, Frank Hutter
Learning Lane Graphs from Aerial Imagery Using Transformers
Martin Büchner, Simon Dorer, Abhinav Valada
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam
Unveiling and Controlling Anomalous Attention Distribution in Transformers
Ruiqing Yan, Xingbo Du, Haoyu Deng, Linghan Zheng, Qiuzhuang Sun, Jifang Hu, Yuhang Shao, Penghao Jiang, Jinrong Jiang, Lian Zhao
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li, Meng Wang, Shuai Zhang, Sijia Liu, Pin-Yu Chen
The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers
Abhi Kamboj
Extracting thin film structures of energy materials using transformers
Chen Zhang, Valerie A. Niemann, Peter Benedek, Thomas F. Jaramillo, Mathieu Doucet
METRIK: Measurement-Efficient Randomized Controlled Trials using Transformers with Input Masking
Sayeri Lala, Niraj K. Jha