Pre Training
Pre-training involves initially training large models on massive datasets to learn generalizable features before fine-tuning them for specific tasks. Current research focuses on improving data efficiency through techniques like carefully curated datasets, task-oriented pre-training, and novel data selection methods, often employing transformer architectures and contrastive learning. These advancements aim to reduce computational costs and enhance model performance across diverse domains, impacting fields ranging from natural language processing and computer vision to medical imaging and graph analysis. The ultimate goal is to create more robust, efficient, and adaptable models with reduced environmental impact.
Papers
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto
Position Prediction as an Effective Pretraining Strategy
Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Yitan Cheng, Walter Talbott, Chen Huang, Hanlin Goh, Joshua Susskind
GaitForeMer: Self-Supervised Pre-Training of Transformers via Human Motion Forecasting for Few-Shot Gait Impairment Severity Estimation
Mark Endo, Kathleen L. Poston, Edith V. Sullivan, Li Fei-Fei, Kilian M. Pohl, Ehsan Adeli
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat