Noisy Supervision

Noisy supervision, where training data contains inaccurate or incomplete labels, is a pervasive challenge across machine learning domains, hindering model accuracy and generalization. Current research focuses on mitigating this noise through techniques like synthetic data generation, mixture-of-experts models, and uncertainty estimation, often incorporating pre-trained models (e.g., SAM) to improve label quality. These advancements are crucial for improving the robustness and efficiency of machine learning models trained on real-world, often imperfect, datasets, impacting fields ranging from image recognition and natural language processing to reinforcement learning.

Papers