Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models
Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville Kyrki, Danica Kragic, Mårten Björkman
Extracting Targeted Training Data from ASR Models, and How to Mitigate It
Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays
Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
Daniel Bertschinger, Christoph Hertrich, Paul Jungeblut, Tillmann Miltzow, Simon Weber
FedSynth: Gradient Compression via Synthetic Data in Federated Learning
Shengyuan Hu, Jack Goetz, Kshitiz Malik, Hongyuan Zhan, Zhe Liu, Yue Liu
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini, Alicia Lozano-Diez, Mireia Diez, Lukáš Burget
End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève
Federated Learning for the Classification of Tumor Infiltrating Lymphocytes
Ujjwal Baid, Sarthak Pati, Tahsin M. Kurc, Rajarsi Gupta, Erich Bremer, Shahira Abousamra, Siddhesh P. Thakur, Joel H. Saltz, Spyridon Bakas
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning
Georg Pichler, Marco Romanelli, Leonardo Rey Vega, Pablo Piantanida