Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Improving generalization by mimicking the human visual diet
Spandan Madan, You Li, Mengmi Zhang, Hanspeter Pfister, Gabriel Kreiman
Reconstructing Training Data from Trained Neural Networks
Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani
Towards ML Methods for Biodiversity: A Novel Wild Bee Dataset and Evaluations of XAI Methods for ML-Assisted Rare Species Annotations
Teodor Chiaburu, Felix Biessmann, Frank Hausser
Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows
Phillip Si, Zeyi Chen, Subham Sekhar Sahoo, Yair Schiff, Volodymyr Kuleshov
Toward Student-Oriented Teacher Network Training For Knowledge Distillation
Chengyu Dong, Liyuan Liu, Jingbo Shang
Downlink Power Allocation in Massive MIMO via Deep Learning: Adversarial Attacks and Training
B. R. Manoj, Meysam Sadeghi, Erik G. Larsson