Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Wakeword Detection under Distribution Shifts
Sree Hari Krishnan Parthasarathi, Lu Zeng, Christin Jose, Joseph Wang
Normalized gradient flow optimization in the training of ReLU artificial neural networks
Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss
Developing a Component Comment Extractor from Product Reviews on E-Commerce Sites
Shogo Anda, Masato Kikuchi, Tadachika Ozono
Training Transformers Together
Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps
Alexandre Pasquiou, Yair Lakretz, John Hale, Bertrand Thirion, Christophe Pallier
FedHeN: Federated Learning in Heterogeneous Networks
Durmus Alp Emre Acar, Venkatesh Saligrama
EpiGRAF: Rethinking training of 3D GANs
Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka
Imitation Learning for Nonprehensile Manipulation through Self-Supervised Learning Considering Motion Speed
Yuki Saigusa, Sho Sakaino, Toshiaki Tsuji
Propagation with Adaptive Mask then Training for Node Classification on Attributed Networks
Jinsong Chen, Boyu Li, Qiuting He, Kun He