Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Data Curation Alone Can Stabilize In-context Learning
Ting-Yun Chang, Robin Jia
Weakly supervised training of universal visual concepts for multi-domain semantic segmentation
Petra Bevandić, Marin Oršić, Ivan Grubišić, Josip Šarić, Siniša Šegvić
Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy, Anoop Kunchukuttan
An Augmentation Strategy for Visually Rich Documents
Jing Xie, James B. Wendt, Yichao Zhou, Seth Ebner, Sandeep Tata
Reconstructing Training Data from Model Gradient, Provably
Zihan Wang, Jason D. Lee, Qi Lei
Towards Fleet-wide Sharing of Wind Turbine Condition Information through Privacy-preserving Federated Learning
Lorin Jenkel, Stefan Jonas, Angela Meyer
Dock2D: Synthetic data for the molecular recognition problem
Siddharth Bhadra-Lobo, Georgy Derevyanko, Guillaume Lamoureux
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification
Kishaloy Halder, Josip Krapac, Alan Akbik, Anthony Brew, Matti Lyra
MLC at HECKTOR 2022: The Effect and Importance of Training Data when Analyzing Cases of Head and Neck Tumors using Machine Learning
Vajira Thambawita, Andrea M. Storås, Steven A. Hicks, Pål Halvorsen, Michael A. Riegler