Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
MS-PS: A Multi-Scale Network for Photometric Stereo With a New Comprehensive Training Dataset
Clément Hardy, Yvain Quéau, David Tschumperlé
Training Data Improvement for Image Forgery Detection using Comprint
Hannes Mareen, Dante Vanden Bussche, Glenn Van Wallendael, Luisa Verdoliva, Peter Lambert
A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance
Marco Loog, Tom Viering
Combating noisy labels in object detection datasets
Krystian Chachuła, Jakub Łyskawa, Bartłomiej Olber, Piotr Frątczak, Adam Popowicz, Krystian Radlak
TrustGAN: Training safe and trustworthy deep learning models through generative adversarial networks
Hélion du Mas des Bourboux