Label Noise
Label noise, the presence of incorrect labels in training datasets, significantly hinders the performance and robustness of machine learning models. Current research focuses on developing methods to mitigate this issue, exploring techniques like loss function modifications, sample selection strategies (e.g., identifying and removing or down-weighting noisy samples), and the use of robust algorithms such as those based on nearest neighbors or contrastive learning, often applied within deep neural networks or gradient boosted decision trees. Addressing label noise is crucial for improving the reliability and generalizability of machine learning models across various applications, from medical image analysis to natural language processing, and is driving the development of new benchmark datasets and evaluation metrics.
Papers
Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise
Bidur Khanal, Tianhong Dai, Binod Bhattarai, Cristian Linte
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paul Albert, Jack Valmadre, Eric Arazo, Tarun Krishna, Noel E. O'Connor, Kevin McGuinness