Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers - Page 25
Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information
Aishwarya Jayagopal, Hansheng Xue, Ziyang He, Robert J. Walsh, Krishna Kumar Hariprasannan, David Shao Peng Tan, Tuan Zea Tan, Jason J. Pitt+2Can Transformers Predict Vibrations?
Fusataka Kuniyoshi, Yoshihide Sawada
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, Ievgen RedkoReusing Softmax Hardware Unit for GELU Computation in Transformers
Christodoulos Peltekis, Kosmas Alexandridis, Giorgos DimitrakopoulosWhy are Sensitive Functions Hard for Transformers?
Michael Hahn, Mark Rofin
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny ZhouTransformers, parallel computation, and logarithmic depth
Clayton Sanford, Daniel Hsu, Matus TelgarskyI can't see it but I can Fine-tune it: On Encrypted Fine-tuning of Transformers using Fully Homomorphic Encryption
Prajwal Panzade, Daniel Takabi, Zhipeng Cai
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami, Monte Hoover, Pranav S. Dulepet, Ramani DuraiswamiTowards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert Dick, Ekdeep Singh Lubana, Hidenori Tanaka
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel PeyréAttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Itay Lavie, Guy Gur-Ari, Zohar RingelProgressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael GastparLearning a Decision Tree Algorithm with Transformers
Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin JaggiClipFormer: Key-Value Clipping of Transformers on Memristive Crossbars for Write Noise Mitigation
Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda