Large Scale Pretraining
Large-scale pretraining leverages massive datasets to train foundational models that can then be fine-tuned for specific downstream tasks, significantly improving efficiency and performance compared to training from scratch. Current research focuses on optimizing pretraining strategies, including data curation techniques like deduplication and joint example selection, and exploring advanced architectures such as Vision Transformers and efficient adaptation methods like LoRA. This approach has yielded substantial improvements across diverse fields, from natural language processing and computer vision to drug discovery and remote sensing, by enabling high-performing models with reduced computational costs and data requirements.
Papers
November 5, 2024
September 20, 2024
July 9, 2024
June 25, 2024
June 10, 2024
June 9, 2024
June 4, 2024
May 21, 2024
May 14, 2024
April 26, 2024
April 1, 2024
March 14, 2024
February 26, 2024
February 15, 2024
January 29, 2024
October 11, 2023
September 20, 2023
May 22, 2023
March 30, 2023
March 28, 2023