Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Synthetic ECG Generation for Data Augmentation and Transfer Learning in Arrhythmia Classification
José Fernando Núñez, Jamie Arjona, Javier Béjar
Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems
Miha Malenšek, Blaž Škrlj, Blaž Mramor, Jure Demšar
Training Data Synthesis with Difficulty Controlled Diffusion Model
Zerun Wang, Jiafeng Mao, Xueting Wang, Toshihiko Yamasaki
RealSeal: Revolutionizing Media Authentication with Real-Time Realism Scoring
Bhaktipriya Radharapu, Harish Krishna
Synthetic Data Generation with LLM for Improved Depression Prediction
Andrea Kang, Jun Yu Chen, Zoe Lee-Youngzie, Shuhao Fu
Pre-training for Action Recognition with Automatically Generated Fractal Datasets
Davyd Svyezhentsev, George Retsinas, Petros Maragos
SynDiff-AD: Improving Semantic Segmentation and End-to-End Autonomous Driving with Synthetic Data from Latent Diffusion Models
Harsh Goel, Sai Shankar Narasimhan, Oguzhan Akcin, Sandeep Chinchali
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing
Utsab Saha, Tanvir Muntakim Tonoy, Hafiz Imtiaz
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
Sourav Banerjee, Ayushi Agarwal, Promila Ghosh
Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting
Liran Nochumsohn, Michal Moshkovitz, Orly Avner, Dotan Di Castro, Omri Azencot
Tackling Data Heterogeneity in Federated Time Series Forecasting
Wei Yuan, Guanhua Ye, Xiangyu Zhao, Quoc Viet Hung Nguyen, Yang Cao, Hongzhi Yin
AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks
You Li, Fan Ma, Yi Yang