Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Developing the Reliable Shallow Supervised Learning for Thermal Comfort using ASHRAE RP-884 and ASHRAE Global Thermal Comfort Database II
Kanisius Karyono, Badr M. Abdullah, Alison J. Cotgrave, Ana Bras, Jeff Cullen
Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
Naman D Singh, Francesco Croce, Matthias Hein
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, Yingyu Liang, Somesh Jha
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
Samyak Jain, Sravanti Addepalli, Pawan Sahu, Priyam Dey, R. Venkatesh Babu