Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
End-to-end Training and Decoding for Pivot-based Cascaded Translation Model
Hao Cheng, Meng Zhang, Liangyou Li, Qun Liu, Zhihua Zhang
DocLangID: Improving Few-Shot Training to Identify the Language of Historical Documents
Furkan Simsek, Brian Pfitzmann, Hendrik Raetz, Jona Otholt, Haojin Yang, Christoph Meinel
Federated Learning for Predictive Maintenance and Quality Inspection in Industrial Applications
Viktorija Pruckovskaja, Axel Weissenfeld, Clemens Heistracher, Anita Graser, Julia Kafka, Peter Leputsch, Daniel Schall, Jana Kemnitz
Shot Optimization in Quantum Machine Learning Architectures to Accelerate Training
Koustubh Phalak, Swaroop Ghosh
Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
Yunjie Ji, Yan Gong, Yong Deng, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Matthias Kleinert
Who breaks early, looses: goal oriented training of deep neural networks based on port Hamiltonian dynamics
Julian Burghoff, Marc Heinrich Monells, Hanno Gottschalk
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data
Tianyu Han, Lisa C. Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexander Löser, Daniel Truhn, Keno K. Bressem