Transformer Megatron Decepticons
Transformer models are being extensively investigated for various sequence processing tasks, moving beyond natural language processing to encompass time series forecasting, image recognition, and scientific computing applications like solving partial differential equations. Current research focuses on improving efficiency (e.g., through mixed-precision quantization and optimized architectures), enhancing generalization capabilities (particularly to longer sequences), and understanding the underlying mechanisms of in-context learning. These advancements have significant implications for diverse fields, improving the accuracy and efficiency of numerous applications while simultaneously deepening our theoretical understanding of these powerful models.
Papers
OctFormer: Octree-based Transformers for 3D Point Clouds
Peng-Shuai Wang
MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture
Diogo Nunes Goncalves, Jose Marcato Junior, Pedro Zamboni, Hemerson Pistori, Jonathan Li, Keiller Nogueira, Wesley Nunes Goncalves
On the Expressivity Role of LayerNorm in Transformers' Attention
Shaked Brody, Uri Alon, Eran Yahav
Analogy-Forming Transformers for Few-Shot 3D Parsing
Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki
Contour Completion by Transformers and Its Application to Vector Font Data
Yusuke Nagata, Brian Kenji Iwana, Seiichi Uchida
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner, Jacques Chen, Jonathan Wilder Lavington, Mark Schmidt