Dataset Creation
Dataset creation for machine learning, particularly in complex domains like natural language processing and computer vision, is a critical area of research focusing on improving data quality, efficiency, and representativeness. Current efforts involve developing automated pipelines for data generation and annotation, leveraging large language models to streamline the process, and employing novel techniques like auction mechanisms to optimize resource allocation. These advancements are crucial for enhancing the reliability and generalizability of machine learning models, impacting various fields from legal tech and finance to healthcare and industrial automation.
Papers
November 11, 2024
October 23, 2024
October 11, 2024
September 27, 2024
September 18, 2024
September 11, 2024
August 22, 2024
August 20, 2024
July 11, 2024
July 9, 2024
July 8, 2024
June 24, 2024
June 21, 2024
June 3, 2024
May 22, 2024
May 8, 2024
March 1, 2024
February 26, 2024
February 19, 2024
November 27, 2023