Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Robust Category-Level 3D Pose Estimation from Synthetic Data
Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Alan Yuille, Adam Kortylewski
Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
Rawad Melhem, Assef Jafar, Oumayma Al Dakkak
Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Harsha Nori, Sergey Yekhanin
Post-processing Private Synthetic Data for Improving Utility on Selected Measures
Hao Wang, Shivchander Sudalairaj, John Henning, Kristjan Greenewald, Akash Srivastava
Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science
Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West
Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Setareh Dabiri, Vasileios Lioutas, Berend Zwartsenberg, Yunpeng Liu, Matthew Niedoba, Xiaoxuan Liang, Dylan Green, Justice Sefas, Jonathan Wilder Lavington, Frank Wood, Adam Scibior
Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques
Andrea Lampis, Eugenio Lomurno, Matteo Matteucci
Face Recognition Using Synthetic Face Data
Omer Granoviter, Alexey Gruzdev, Vladimir Loginov, Max Kogan, Orly Zvitia
Utility Theory of Synthetic Data Generation
Shirong Xu, Will Wei Sun, Guang Cheng