Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Data Contamination Through the Lens of Time
Manley Roberts, Himanshu Thakur, Christine Herlihy, Colin White, Samuel Dooley
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
Huayang Li, Tian Lan, Zihao Fu, Deng Cai, Lemao Liu, Nigel Collier, Taro Watanabe, Yixuan Su
PuoBERTa: Training and evaluation of a curated language model for Setswana
Vukosi Marivate, Moseli Mots'Oehli, Valencia Wagner, Richard Lastrucci, Isheanesu Dzingirai
Training and Predicting Visual Error for Real-Time Applications
João Libório Cardoso, Bernhard Kerbl, Lei Yang, Yury Uralsky, Michael Wimmer
Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding
Jixuan Cui, Jun Li, Zhen Mei, Kang Wei, Sha Wei, Ming Ding, Wen Chen, Song Guo
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji
Strategies and impact of learning curve estimation for CNN-based image classification
Laura Didyk, Brayden Yarish, Michael A. Beck, Christopher P. Bidinosti, Christopher J. Henry
Debias the Training of Diffusion Models
Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, Feng Zhao
Effects of Human Adversarial and Affable Samples on BERT Generalization
Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor