Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana
Systolic Arrays and Structured Pruning Co-design for Efficient Transformers in Edge Systems
Pedro Palacios, Rafael Medina, Jean-Luc Rouas, Giovanni Ansaloni, David Atienza
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei
Measure-to-measure interpolation using Transformers
Borjan Geshkovski, Philippe Rigollet, Domènec Ruiz-Balet
Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory
Ali K. AlShami, Terrance Boult, Jugal Kalita
LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers
Yeong-Seung Baek, Heung-Seon Oh
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis
Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Rina Panigrahy
Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning
Bochen Yang, Kaizhong Deng, Christopher J Peters, George Mylonas, Daniel S. Elson
$k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris