Dataset Management
Dataset management focuses on optimizing the creation, curation, and utilization of datasets for machine learning and scientific discovery, aiming to improve data quality, accessibility, and reproducibility. Current research emphasizes developing robust methods for synthetic data generation, analyzing dataset structure using large language models to identify subpopulations, and improving dataset documentation and sharing practices across platforms. Effective dataset management is crucial for accelerating scientific progress and ensuring the reliability and fairness of AI applications, particularly in sensitive domains like healthcare.
Papers
October 31, 2024
October 2, 2024
September 20, 2024
July 11, 2024
May 3, 2024
February 21, 2024
February 9, 2024
January 24, 2024
May 26, 2023
March 20, 2023
March 15, 2023
March 9, 2023