Synthetic Dataset
Synthetic datasets are artificial datasets designed to mimic the statistical properties of real-world data, primarily aiming to address data scarcity, privacy concerns, or high annotation costs in various machine learning applications. Current research focuses on improving the fidelity and diversity of synthetic data using generative models like variational autoencoders, generative adversarial networks, and diffusion models, often incorporating techniques like knowledge distillation and trajectory matching to enhance efficiency and effectiveness. The development and validation of high-quality synthetic datasets are crucial for advancing machine learning in fields like healthcare, robotics, and remote sensing, where acquiring sufficient real data is challenging or ethically problematic.
Papers
Towards Synthetic Data Generation for Improved Pain Recognition in Videos under Patient Constraints
Jonas Nasimzada, Jens Kleesiek, Ken Herrmann, Alina Roitberg, Constantin Seibold
TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models
Andrei Margeloiu, Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik
Synthetic data augmentation for robotic mobility aids to support blind and low vision people
Hochul Hwang, Krisha Adhikari, Satya Shodhaka, Donghyun Kim
Image-to-Image Translation Based on Deep Generative Modeling for Radiotherapy Synthetic Dataset Creation
Olga Glazunova, Cecile J.A. Wolfs, Frank Verhaegen
Deep Generative Model for Mechanical System Configuration Design
Yasaman Etesam, Hyunmin Cheong, Mohammadmehdi Ataei, Pradeep Kumar Jayaraman
Predicting Critical Heat Flux with Uncertainty Quantification and Domain Generalization Using Conditional Variational Autoencoders and Deep Neural Networks
Farah Alsafadi, Aidan Furlong, Xu Wu
SynMorph: Generating Synthetic Face Morphing Dataset with Mated Samples
Haoyu Zhang, Raghavendra Ramachandra, Kiran Raja, Christoph Busch