Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Leveraging Synthetic Data for Generalizable and Fair Facial Action Unit Detection
Liupei Lu, Yufeng Yin, Yuming Gu, Yizhen Wu, Pratusha Prasad, Yajie Zhao, Mohammad Soleymani
Structured Evaluation of Synthetic Tabular Data
Scott Cheng-Hsin Yang, Baxter Eaves, Michael Schmidt, Ken Swanson, Patrick Shafto
Federated Data Model
Xiao Chen, Shunan Zhang, Eric Z. Chen, Yikang Liu, Lin Zhao, Terrence Chen, Shanhui Sun
Historical Astronomical Diagrams Decomposition in Geometric Primitives
Syrine Kalleli, Scott Trigg, Ségolène Albouy, Mathieu Husson, Mathieu Aubry
Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records
Erlend Frayling, Jake Lever, Graham McDonald
Quantifying and Mitigating Privacy Risks for Tabular Generative Models
Chaoyi Zhu, Jiayi Tang, Hans Brouwer, Juan F. Pérez, Marten van Dijk, Lydia Y. Chen
Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data
Miguel Fuentes, Brett Mullins, Ryan McKenna, Gerome Miklau, Daniel Sheldon
Fine-grainedly Synthesize Streaming Data Based On Large Language Models With Graph Structure Understanding For Data Sparsity
Xin Zhang, Linhai Zhang, Deyu Zhou, Guoqiang Xu
Towards In-Vehicle Multi-Task Facial Attribute Recognition: Investigating Synthetic Data and Vision Foundation Models
Esmaeil Seraj, Walter Talamonti
Synthetic Privileged Information Enhances Medical Image Representation Learning
Lucas Farndale, Chris Walsh, Robert Insall, Ke Yuan
Synthetic data generation for system identification: leveraging knowledge transfer from similar systems
Dario Piga, Matteo Rufolo, Gabriele Maroni, Manas Mejari, Marco Forgione
Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data
Sagi Eppel, Jolina Li, Manuel Drehwald, Alan Aspuru-Guzik
From Noise to Signal: Unveiling Treatment Effects from Digital Health Data through Pharmacology-Informed Neural-SDE
Samira Pakravan, Nikolaos Evangelou, Maxime Usdin, Logan Brooks, James Lu
Towards Foundation Time Series Model: To Synthesize Or Not To Synthesize?
Kseniia Kuvshinova, Olga Tsymboi, Alina Kostromina, Dmitry Simakov, Elizaveta Kovtun
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground
Adil Soubki, John Murzaku, Arash Yousefi Jordehi, Peter Zeng, Magdalena Markowska, Seyed Abolghasem Mirroshandel, Owen Rambow