Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Synthetic Image Data for Deep Learning
Jason W. Anderson, Marcin Ziolkowski, Ken Kennedy, Amy W. Apon
Evaluation of Synthetic Datasets for Conversational Recommender Systems
Harsh Lara, Manoj Tiwari
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu
Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding
Jay Bhanushali, Manivannan Muniyandi, Praneeth Chakravarthula
Synthetic Data for Object Classification in Industrial Applications
August Baaz, Yonan Yonan, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez, Felix Nilsson
Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning
Zirui Wang, Shaoming Duan, Chengyue Wu, Wenhao Lin, Xinyu Zha, Peiyi Han, Chuanyi Liu
PASTA: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization
Prithvijit Chattopadhyay, Kartik Sarangmath, Vivek Vijaykumar, Judy Hoffman
A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering
Matthew Maufe, James Ravenscroft, Rob Procter, Maria Liakata
Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders
Ciro Antonio Mami, Andrea Coser, Eric Medvet, Alexander T. P. Boudewijn, Marco Volpe, Michael Whitworth, Borut Svara, Gabriele Sgroi, Daniele Panfilo, Sebastiano Saccani
Synthetic data enable experiments in atomistic machine learning
John L. A. Gardner, Zoé Faure Beaulieu, Volker L. Deringer
Procedural Image Programs for Representation Learning
Manel Baradad, Chun-Fu Chen, Jonas Wulff, Tongzhou Wang, Rogerio Feris, Antonio Torralba, Phillip Isola
Analysis of Training Object Detection Models with Synthetic Data
Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels
Chart-RCNN: Efficient Line Chart Data Extraction from Camera Images
Shufan Li, Congxi Lu, Linkai Li, Haoshuai Zhou
CAD2Render: A Modular Toolkit for GPU-accelerated Photorealistic Synthetic Data Generation for the Manufacturing Industry
Steven Moonen, Bram Vanherle, Joris de Hoog, Taoufik Bourgana, Abdellatif Bey-Temsamani, Nick Michiels