Training Data
Training data is crucial for machine learning model development, with current research focusing on improving data quality, efficiency, and mitigating biases. Active areas include generating synthetic data to address scarcity or privacy concerns, developing algorithms to optimize data selection and usage (e.g., self-paced learning, active learning), and mitigating issues like data contamination and imbalance through techniques such as data augmentation, selective parameter merging, and novel loss functions. The quality and characteristics of training data significantly impact model performance, generalization, and robustness, influencing various applications from natural language processing and image recognition to scientific computing and medical diagnosis.
Papers - Page 35
Microphone Conversion: Mitigating Device Variability in Sound Event Classification
Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han ParkThe Unreasonable Effectiveness of Easy Training Data for Hard Tasks
Peter Hase, Mohit Bansal, Peter Clark, Sarah WiegreffeEnhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning
Chenyang Wang, Junjun Jiang, Xingyu Hu, Xianming Liu, Xiangyang Ji
Comprehensive Exploration of Synthetic Data Generation: A Survey
André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian FosterMining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu, Shan Ning, Xuming HeUnderstanding LLMs: A Comprehensive Overview from Training to Inference
Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan+10
Self-supervised learning for skin cancer diagnosis with limited training data
Hamish Haggerty, Rohitash ChandraDigger: Detecting Copyright Content Mis-usage in Large Language Model Training
Haodong Li, Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, Yang Liu, Guoai Xu, Guosheng Xu, Haoyu Wang
The Duck's Brain: Training and Inference of Neural Networks in Modern Database Engines
Maximilian E. Schüle, Thomas Neumann, Alfons KemperSparseProp: Efficient Event-Based Simulation and Training of Sparse Recurrent Spiking Neural Networks
Rainer EngelkenLayer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer Level Attack and Knowledge Distillation
Hyunjune Kim, Sangyong Lee, Simon S. Woo