Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Post-processing Private Synthetic Data for Improving Utility on Selected Measures
Hao Wang, Shivchander Sudalairaj, John Henning, Kristjan Greenewald, Akash Srivastava
Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science
Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West
Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Setareh Dabiri, Vasileios Lioutas, Berend Zwartsenberg, Yunpeng Liu, Matthew Niedoba, Xiaoxuan Liang, Dylan Green, Justice Sefas, Jonathan Wilder Lavington, Frank Wood, Adam Scibior
Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques
Andrea Lampis, Eugenio Lomurno, Matteo Matteucci
Face Recognition Using Synthetic Face Data
Omer Granoviter, Alexey Gruzdev, Vladimir Loginov, Max Kogan, Orly Zvitia
Utility Theory of Synthetic Data Generation
Shirong Xu, Will Wei Sun, Guang Cheng
Fashion CUT: Unsupervised domain adaptation for visual pattern classification in clothes using synthetic data and pseudo-labels
Enric Moreu, Alex Martinelli, Martina Naughton, Philip Kelly, Noel E. O'Connor
Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy
Aryan Jadon, Shashank Kumar
Novel Synthetic Data Tool for Data-Driven Cardboard Box Localization
Lukáš Gajdošech, Peter Kravár