Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Learning from Predictions: Fusing Training and Autoregressive Inference for Long-Term Spatiotemporal Forecasts
Pantelis R. Vlachas, Petros Koumoutsakos
Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks
Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi, Oleg Rokhlenko
Leveraging Contaminated Datasets to Learn Clean-Data Distribution with Purified Generative Adversarial Networks
Bowen Tian, Qinliang Su, Jianxing Yu
Symbiosis of an artificial neural network and models of biological neurons: training and testing
Tatyana Bogatenko, Konstantin Sergeev, Andrei Slepnev, Jürgen Kurths, Nadezhda Semenova