Vanilla Transformer
Vanilla Transformers, the foundational architecture of many successful machine learning models, are being extensively investigated to improve their efficiency and performance across diverse tasks. Current research focuses on adapting the vanilla Transformer architecture for specific problem domains, such as robotics, time series forecasting, and multimodal learning, often through innovative attention mechanisms, tokenization strategies, and the incorporation of inductive biases. These efforts aim to enhance model interpretability, reduce computational costs, and improve accuracy, ultimately impacting various fields from natural language processing and computer vision to more specialized applications like soil temperature prediction and traffic forecasting. The ongoing refinement of vanilla Transformers is crucial for advancing the capabilities and applicability of machine learning across a wide range of scientific and practical domains.
Papers
SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity
Chuang Yang, Renhe Jiang, Xiaohang Xu, Chuan Xiao, Kaoru Sezaki
Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study
Robert Spencer, Surangika Ranathunga, Mikael Boulic, Andries (Hennie) van Heerden, Teo Susnjak