Synthetic Data Generation
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real data, addressing limitations in data availability, privacy concerns, and the high cost of data annotation. Current research focuses on developing advanced generative models, including diffusion models, generative adversarial networks, and methods leveraging large language models, to produce high-fidelity synthetic data across diverse data types (tabular, image, text, and even 3D models). This field is crucial for advancing machine learning in various domains, enabling the training of robust models in situations where real data is scarce, expensive, or sensitive, and improving the reliability and fairness of AI systems.
Papers
Synthetica: Large Scale Synthetic Data for Robot Perception
Ritvik Singh, Jingzhou Liu, Karl Van Wyk, Yu-Wei Chao, Jean-Francois Lafleche, Florian Shkurti, Nathan Ratliff, Ankur Handa
zGAN: An Outlier-focused Generative Adversarial Network For Realistic Synthetic Data Generation
Azizjon Azimi, Bonu Boboeva, Ilyas Varshavskiy, Shuhrat Khalilbekov, Akhlitdin Nizamitdinov, Najima Noyoftova, Sergey Shulgin
No more hard prompts: SoftSRV prompting for synthetic data generation
Giulia DeSalvo, Giulia DeSalvo, Jean-Fracois Kagy, Lazaros Karydas, Afshin Rostamizadeh, Sanjiv Kumar
LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs -- Evaluation through Synthetic Data Generation
Tejumade Afonja, Ivaxi Sheth, Ruta Binkyte, Waqar Hanif, Thomas Ulas, Matthias Becker, Mario Fritz