Dataset Quality
Dataset quality is crucial for training effective machine learning models, impacting model performance and reliability across diverse applications like medical image analysis and fake news detection. Current research emphasizes moving beyond simple metrics like size and class balance, focusing instead on more nuanced measures of data diversity and density, often leveraging techniques from ecology and information theory to quantify effective data richness. This improved understanding of dataset quality is leading to the development of frameworks for automated assessment and data-centric approaches to improve existing datasets, ultimately enhancing the trustworthiness and generalizability of machine learning models.
Papers
July 22, 2024
July 2, 2024
May 21, 2024
December 11, 2023
June 27, 2023
April 2, 2023
December 19, 2022