Dataset Synthesis
Dataset synthesis focuses on creating artificial datasets to augment or replace real-world data for training machine learning models, addressing challenges like data scarcity, cost, and annotation effort. Current research explores various synthesis methods, including those leveraging large language models for text data, diffusion models for image generation, and techniques that incorporate retrieval augmentation or contrastive learning to improve data quality and diversity. This field is crucial for advancing machine learning in data-limited domains, enabling efficient model training and potentially unlocking applications in areas like medical imaging, manufacturing process optimization, and code model fine-tuning.
Papers
August 15, 2024
June 10, 2024
May 16, 2024
October 20, 2023
August 20, 2023
August 18, 2023
August 16, 2023
June 20, 2023
June 12, 2023
February 3, 2023
October 22, 2022