Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers
Studying Accuracy of Machine Learning Models Trained on Lab Lifting Data in Solving Real-World Problems Using Wearable Sensors for Workplace Safety
Joseph Bertrand, Nick Griffey, Ming-Lun Lu, Rashmi Jha
DoG-Instruct: Towards Premium Instruction-Tuning Data via Text-Grounded Instruction Wrapping
Yongrui Chen, Haiyun Jiang, Xinting Huang, Shuming Shi, Guilin Qi
Employing Real Training Data for Deep Noise Suppression
Ziyi Xu, Marvin Sach, Jan Pirklbauer, Tim Fingscheidt
Aggregating Correlated Estimations with (Almost) no Training
Theo Delemazure, François Durand, Fabien Mathieu
Analyzing domain shift when using additional data for the MICCAI KiTS23 Challenge
George Stoica, Mihaela Breaban, Vlad Barbu
On the training and generalization of deep operator networks
Sanghyun Lee, Yeonjong Shin
Comparative Analysis of Deep Learning Architectures for Breast Cancer Diagnosis Using the BreaKHis Dataset
İrem Sayın, Muhammed Ali Soydaş, Yunus Emre Mert, Arda Yarkataş, Berk Ergun, Selma Sözen Yeh, Hüseyin Üvet
Tight Bounds for Machine Unlearning via Differential Privacy
Yiyang Huang, Clément L. Canonne
Analysis of Diagnostics (Part I): Prevalence, Uncertainty Quantification, and Machine Learning
Paul N. Patrone, Raquel A. Binder, Catherine S. Forconi, Ann M. Moormann, Anthony J. Kearsley
Training Towards Critical Use: Learning to Situate AI Predictions Relative to Human Knowledge
Anna Kawakami, Luke Guerdan, Yanghuidi Cheng, Matthew Lee, Scott Carter, Nikos Arechiga, Kate Glazko, Haiyi Zhu, Kenneth Holstein
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak
Improving the State of the Art for Training Human-AI Teams: Technical Report #3 -- Analysis of Testbed Alternatives
Lillian Asiala, James E. McCarthy, Lixiao Huang
Improving the State of the Art for Training Human-AI Teams: Technical Report #2 -- Results of Researcher Knowledge Elicitation Survey
James E. McCarthy, Lillian Asiala, LeeAnn Maryeski, Dawn Sillars
Improving the State of the Art for Training Human-AI Teams: Technical Report #1 -- Results of Subject-Matter Expert Knowledge Elicitation Survey
James E. McCarthy, Lillian Asiala, LeeAnn Maryeski, Nyla Warren
Analysis of Learned Features and Framework for Potato Disease Detection
Shikha Gupta, Soma Chakraborty, Renu Rameshan
Few-Shot Object Detection via Synthetic Features with Optimal Transport
Anh-Khoa Nguyen Vu, Thanh-Toan Do, Vinh-Tiep Nguyen, Tam Le, Minh-Triet Tran, Tam V. Nguyen