Baseline Transformer
Baseline Transformer models serve as foundational architectures for various machine learning tasks, particularly in natural language processing and computer vision, but their computational demands often limit practical applications. Current research focuses on improving efficiency through techniques like attention mechanism optimization, novel positional encoding methods, and architectural modifications such as incorporating control theory principles or employing mixture-of-experts approaches. These efforts aim to reduce parameter counts, memory usage, and computational complexity while maintaining or improving performance, thereby broadening the applicability of Transformer models across diverse domains.
Papers
April 30, 2024
April 29, 2024
February 25, 2024
February 14, 2024
January 23, 2024
December 13, 2023
December 1, 2023
May 23, 2023
May 18, 2023
April 11, 2023
April 4, 2023
October 21, 2022
September 16, 2022
May 19, 2022
February 21, 2022