Dataset Refinement

Dataset refinement focuses on improving the quality and utility of datasets used to train machine learning models, aiming to enhance model performance, robustness, and fairness. Current research emphasizes automated methods for identifying and correcting errors, such as noisy labels or biased samples, often employing techniques like Shapley value analysis, generative models (e.g., diffusion models), and human-in-the-loop approaches for iterative refinement. These advancements are crucial for accelerating the development of reliable and effective AI systems across diverse applications, from robotics and medical image analysis to natural language processing.

Papers