Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Utilizing Synthetic Data in Supervised Learning for Robust 5-DoF Magnetic Marker Localization
Mengfan Wu, Thomas Langerak, Otmar Hilliges, Juan Zarate
Unsupervised Face Recognition using Unlabeled Synthetic Data
Fadi Boutros, Marcel Klemt, Meiling Fang, Arjan Kuijper, Naser Damer
Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes
Adam Dejl, Harsh Deep, Jonathan Fei, Ardavan Saeedi, Li-wei H. Lehman
Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe
Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
Differentially Private Language Models for Secure Data Sharing
Justus Mattern, Zhijing Jin, Benjamin Weggenmann, Bernhard Schoelkopf, Mrinmaya Sachan