Pre Training
Pre-training involves initially training large models on massive datasets to learn generalizable features before fine-tuning them for specific tasks. Current research focuses on improving data efficiency through techniques like carefully curated datasets, task-oriented pre-training, and novel data selection methods, often employing transformer architectures and contrastive learning. These advancements aim to reduce computational costs and enhance model performance across diverse domains, impacting fields ranging from natural language processing and computer vision to medical imaging and graph analysis. The ultimate goal is to create more robust, efficient, and adaptable models with reduced environmental impact.
Papers
An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation
Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan
PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation
Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data
Xiaochuang Han, Yulia Tsvetkov