Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li, Xuhui Jia, Dmitry Lagun, Fredo Durand, William T. Freeman, Mark Matthews
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du, Haoxin Li, Xu Guo, Boyang Li
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza, Fabio Poiesi
Object Detector Differences when using Synthetic and Real Training Data
Martin Georg Ljungqvist, Otto Nordander, Markus Skans, Arvid Mildner, Tony Liu, Pierre Nugues