Pre Training
Pre-training involves initially training large models on massive datasets to learn generalizable features before fine-tuning them for specific tasks. Current research focuses on improving data efficiency through techniques like carefully curated datasets, task-oriented pre-training, and novel data selection methods, often employing transformer architectures and contrastive learning. These advancements aim to reduce computational costs and enhance model performance across diverse domains, impacting fields ranging from natural language processing and computer vision to medical imaging and graph analysis. The ultimate goal is to create more robust, efficient, and adaptable models with reduced environmental impact.
Papers
Understanding In-Context Learning via Supportive Pretraining Data
Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, Tianlu Wang
Distributive Pre-Training of Generative Modeling Using Matrix-Product States
Sheng-Hsuan Lin, Olivier Kuijpers, Sebastian Peterhansl, Frank Pollmann
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research
Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom, Adhiguna Kuncoro
Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications
Han Xie, Da Zheng, Jun Ma, Houyu Zhang, Vassilis N. Ioannidis, Xiang Song, Qing Ping, Sheng Wang, Carl Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme
Textually Pretrained Speech Language Models
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains
Frank Mtumbuka, Steven Schockaert