Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers - Page 16
Transferring disentangled representations: bridging the gap between synthetic and real images
Jacopo Dapueto, Nicoletta Noceti, Francesca OdoneDiversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
Jiawei Du, Xin Zhang, Juncheng Hu, Wenxin Huang, Joey Tianyi Zhou
Exploring synthetic data for cross-speaker style transfer in style representation based TTS
Lucas H. Ueda, Leonardo B. de M. M. Marques, Flávio O. Simões, Mário U. Neto, Fernando Runstein, Bianca Dal Bó, Paula D. P. CostaThe poison of dimensionality
Lê-Nguyên HoangKIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation
Anantaa Kotal, Anupam Joshi
Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints
Jonas Nasimzada, Jens Kleesiek, Ken Herrmann, Alina Roitberg, Constantin SeiboldQuality Matters: Evaluating Synthetic Data for Tool-Using LLMs
Shadi Iskander, Nachshon Cohen, Zohar Karnin, Ori Shapira, Sofia TolmachTabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models
Andrei Margeloiu, Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik
Advancing Employee Behavior Analysis through Synthetic Data: Leveraging ABMs, GANs, and Statistical Models for Enhanced Organizational Efficiency
Rakshitha Jayashankar, Mahesh BalanA Distribution-Aware Flow-Matching for Generating Unstructured Data for Few-Shot Reinforcement Learning
Mohammad Pivezhandi, Abusayeed Saifullah