Data Curation
Data curation focuses on the systematic collection, organization, and refinement of datasets to optimize the performance and reliability of machine learning models. Current research emphasizes automated curation techniques, leveraging large language models (LLMs) to improve data quality, address biases, and efficiently filter large-scale datasets, often incorporating methods like embedding-based filtering and curriculum learning. This work is crucial for advancing various fields, including natural language processing, computer vision, and biomedical research, by ensuring the availability of high-quality, unbiased datasets essential for training robust and reliable AI systems.
Papers
April 16, 2024
April 10, 2024
March 19, 2024
February 21, 2024
February 4, 2024
January 30, 2024
January 12, 2024
December 19, 2023
December 5, 2023
November 21, 2023
October 11, 2023
October 1, 2023
September 29, 2023
June 24, 2023
June 20, 2023
June 1, 2023
May 5, 2023
April 26, 2023
December 20, 2022