Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
GlucoSynth: Generating Differentially-Private Synthetic Glucose Traces
Josephine Lamp, Mark Derdzinski, Christopher Hannemann, Joost van der Linden, Lu Feng, Tianhao Wang, David Evans
Creating Synthetic Datasets for Collaborative Filtering Recommender Systems using Generative Adversarial Networks
Jesús Bobadilla, Abraham Gutiérrez, Raciel Yera, Luis Martínez
Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems
Pratinav Seth, Akshat Bhandari, Kumud Lakara
Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota
3D Surface Reconstruction in the Wild by Deforming Shape Priors from Synthetic Data
Nicolai Häni, Jun-Jee Chao, Volkan Isler
SurvivalGAN: Generating Time-to-Event Data for Survival Analysis
Alexander Norcliffe, Bogdan Cebere, Fergus Imrie, Pietro Lio, Mihaela van der Schaar
Membership Inference Attacks against Synthetic Data through Overfitting Detection
Boris van Breugel, Hao Sun, Zhaozhi Qian, Mihaela van der Schaar