Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Toward responsible face datasets: modeling the distribution of a disentangled latent space for sampling face images from demographic groups
Parsa Rahimi, Christophe Ecabert, Sebastien Marcel
Let's Roll: Synthetic Dataset Analysis for Pedestrian Detection Across Different Shutter Types
Yue Hu, Gourav Datta, Kira Beerel, Peter Beerel
Limited-Angle Tomography Reconstruction via Deep End-To-End Learning on Synthetic Data
Thomas Germer, Jan Robine, Sebastian Konietzny, Stefan Harmeling, Tobias Uelwer
A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction
Min Xiao, Zi Wang, Jiefeng Guo, Xiaobo Qu
Towards High-Quality Specular Highlight Removal by Leveraging Large-Scale Synthetic Data
Gang Fu, Qing Zhang, Lei Zhu, Chunxia Xiao, Ping Li
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier