Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers - Page 2
Predicting Stock Movement with BERTweet and Transformers
Michael Charles Albada, Mojolaoluwa Joshua SonolaGeorgia Institute of TechnologyKolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
Subhajit Maity, Killian Hitsman, Xin Li, Aritra DuttaUniversity of Central FloridaTransformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang LiuMeta●New York University●MIT●Princeton UniversityWhisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
Jakaria Islam Emon, Md Abu Salek, Kazi Tamanna AlamLtd.●Barisal Information Technology College (BITC)Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava VoloshynovskiyUniversity of Geneva
U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers
Zhanjie Zhang, Ao Ma, Ke Cao, Jing Wang, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng, Yuhui YinZhejiang University●360 AI ResearchTransECG: Leveraging Transformers for Explainable ECG Re-identification Risk Analysis
Ziyu Wang, Elahe Khatibi, Kianoosh Kazemi, Iman Azimi, Sanaz Mousavi, Shaista Malik, Amir M. Rahmani
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin, Jian Xie, Siyu Yuan, Deqing YangFudan UniversityTIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
Victor Shea-Jay Huang, Le Zhuo, Yi Xin, Zhaokai Wang, Peng Gao, Hongsheng LiMMLab●Shanghai AI Laboratory●NJU●SJTU
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
William Merrill, Ashish SabharwalNew York University●Allen Institute for AIThree tiers of computation in transformers and in brain architectures
E Graham, R GrangerDartmouth CollegeTransformers for molecular property prediction: Domain adaptation efficiently improves performance
Afnan Sultan, Max Rausch-Dupont, Shahrukh Khan, Olga Kalinina, Andrea Volkamer, Dietrich KlakowSaarland University●Medical Faculty
Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir GlobersonNew York University●Google Research●Tel-Aviv University●Meta●Bar-Ilan University●Tel-Aviv University●Tel-Aviv UniversityCompositional Reasoning with Transformers, RNNs, and Chain of Thought
Gilad Yehudai, Noah Amsel, Joan BrunaNew York University●New York University●Flatiron Institute