Dataset Diversity
Dataset diversity, encompassing the variety of features and characteristics within a dataset, is crucial for training robust and generalizable machine learning models, particularly in sensitive domains like healthcare and autonomous driving. Current research focuses on quantifying and measuring diversity using novel metrics beyond simple size and class balance, employing techniques like conditional variational autoencoders for privacy-preserving data augmentation and leveraging large language models for improved data annotation and taxonomy construction. Improving dataset diversity is vital for enhancing model performance, mitigating bias, and ensuring fairness and reliability across diverse real-world applications.
Papers
November 29, 2024
October 29, 2024
August 8, 2024
August 1, 2024
July 29, 2024
July 25, 2024
July 22, 2024
July 18, 2024
July 11, 2024
April 22, 2024
March 25, 2024
February 4, 2024
October 1, 2023