Real World Noisy Datasets

Real-world datasets are frequently contaminated with noisy labels, hindering the performance of machine learning models. Current research focuses on developing robust training methods that mitigate the impact of this noise, employing techniques like sample selection (identifying and removing or correcting noisy samples), noise-robust loss functions, and the integration of external knowledge sources (e.g., large language models). These advancements are crucial for improving the reliability and generalizability of models trained on real-world data, impacting diverse fields from medical image analysis to autonomous driving where perfectly labeled data is often unavailable or prohibitively expensive to obtain.

Papers