Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Drone Detection using Deep Neural Networks Trained on Pure Synthetic Data
Mariusz Wisniewski, Zeeshan A. Rana, Ivan Petrunin, Alan Holt, Stephen Harman
Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis
Dominik Borer, Jakob Buhmann, Martin Guay
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Hatef Otroshi Shahreza, Sébastien Marcel
SynRL: Aligning Synthetic Clinical Trial Data with Human-preferred Clinical Endpoints Using Reinforcement Learning
Trisha Das, Zifeng Wang, Afrah Shafquat, Mandis Beigi, Jason Mezey, Jimeng Sun
Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation
Wilhelm Ågren, Victorio Úbeda Sosa
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
Yeming Wen, Swarat Chaudhuri
Differential Privacy Under Class Imbalance: Methods and Empirical Insights
Lucas Rosenblatt, Yuliia Lut, Eitan Turok, Marco Avella-Medina, Rachel Cummings
Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification
Chi-en Amy Tai, Oustan Ding, Alexander Wong