Transformer Training
Transformer training focuses on optimizing the performance and efficiency of transformer-based models, primarily through advancements in algorithms and hardware. Current research emphasizes improving training stability, addressing the quadratic complexity of self-attention (e.g., through linear attention mechanisms), and developing efficient training strategies for long sequences and large datasets. These efforts are crucial for expanding the applicability of transformers to increasingly complex tasks and resource-constrained environments, impacting fields ranging from natural language processing and computer vision to scientific modeling and forecasting.
Papers
October 19, 2023
October 2, 2023
September 29, 2023
August 20, 2023
June 21, 2023
June 19, 2023
June 16, 2023
June 3, 2023
May 26, 2023
March 11, 2023
March 4, 2023
March 2, 2023
February 2, 2023
January 6, 2023
October 19, 2022
October 18, 2022
July 7, 2022
March 23, 2022