Supervised Fine Tuning
Supervised fine-tuning (SFT) adapts pre-trained large language models (LLMs) to specific tasks by training them on labeled data, aiming to improve performance and alignment with human preferences. Current research focuses on optimizing SFT methods, including exploring alternative loss functions (e.g., beyond cross-entropy), developing techniques to mitigate training imbalances and overfitting, and investigating the interplay between SFT and reinforcement learning. These advancements are significant because they enhance the efficiency and effectiveness of adapting LLMs for diverse applications, ranging from question answering and code generation to specialized domains like biomedicine and legal text processing.
Papers
L3Ms -- Lagrange Large Language Models
Guneet S. Dhillon, Xingjian Shi, Yee Whye Teh, Alex Smola
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training
Michael Pieler, Marco Bellagente, Hannah Teufel, Duy Phung, Nathan Cooper, Jonathan Tow, Paulo Rocha, Reshinth Adithyan, Zaid Alyafeai, Nikhil Pinnaparaju, Maksym Zhuravinskyi, Carlos Riquelme
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Junyu Luo, Xiao Luo, Xiusi Chen, Zhiping Xiao, Wei Ju, Ming Zhang
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen, Ge Yu, Zhiyuan Liu, Maosong Sun, Chenyan Xiong
Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning
Shuhe Wang, Guoyin Wang, Yizhong Wang, Jiwei Li, Eduard Hovy, Chen Guo
PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency
Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai, Kosuke Nakago, Daisuke Nishino, Toru Ogawa, Daisuke Okanohara, Yoshihiko Ozaki, Shotaro Sano, Shuji Suzuki, Tianqi Xu, Toshihiko Yanase (Preferred Elements, Inc.)