Transformer Based Model
Transformer-based models are a class of neural networks achieving state-of-the-art results across diverse fields by leveraging self-attention mechanisms to capture long-range dependencies in sequential data. Current research focuses on addressing limitations such as quadratic computational complexity for long sequences, leading to the development of alternative architectures like Mamba and modifications such as LoRA for efficient adaptation and inference. These advancements are significantly impacting various applications, from speech recognition and natural language processing to computer vision and time-series forecasting, by improving both accuracy and efficiency on resource-constrained devices.
Papers
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick, Kevin Y. Li, Eric P. Xing, J. Zico Kolter, Albert Gu
sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting
Jiaheng Yin, Zhengxin Shi, Jianshen Zhang, Xiaomin Lin, Yulin Huang, Yongzhi Qi, Wei Qi
Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation
Congbo Ma, Wei Emma Zhang, Dileepa Pitawela, Haojie Zhuang, Yanfeng Shu
Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets
Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi