Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Low-Shot Learning for Fictional Claim Verification
Viswanath Chadalapaka, Derek Nguyen, JoonWon Choi, Shaunak Joshi, Mohammad Rostami
ECG Feature Importance Rankings: Cardiologists vs. Algorithms
Temesgen Mehari, Ashish Sundar, Alen Bosnjakovic, Peter Harris, Steven E. Williams, Axel Loewe, Olaf Doessel, Claudia Nagel, Nils Strodthoff, Philip J. Aston
How far generated data can impact Neural Networks performance?
Sayeh Gholipour Picha, Dawood AL Chanti, Alice Caplier
Knowing the Distance: Understanding the Gap Between Synthetic and Real Data For Face Parsing
Eli Friedman, Assaf Lehr, Alexey Gruzdev, Vladimir Loginov, Max Kogan, Moran Rubin, Orly Zvitia
Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis
Hadrien Reynaud, Mengyun Qiao, Mischa Dombrowski, Thomas Day, Reza Razavi, Alberto Gomez, Paul Leeson, Bernhard Kainz
Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models
Nicholas I-Hsien Kuo, Louisa Jorm, Sebastiano Barbieri
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
Hui Tang, Kui Jia
Generating synthetic multi-dimensional molecular-mediator time series data for artificial intelligence-based disease trajectory forecasting and drug development digital twins: Considerations
Gary An, Chase Cockrell