Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers - Page 25
WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users
William Huang, Sam Ghahremani, Siyou Pei, Yang ZhangAuto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segmentation
Tanvi Deshpande, Eva Prakash, Elsie Gyang Ross, Curtis Langlotz, Andrew Ng, Jeya Maria Jose ValanarasuPrivacy-Preserving Statistical Data Generation: Application to Sepsis Detection
Eric Macias-Fassio, Aythami Morales, Cristina Pruenza, Julian FierrezZero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
Niclas Popp, Jan Hendrik Metzen, Matthias HeinLarge Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums
Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, Graham NeubigPhi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao+120
Multi-objective evolutionary GAN for tabular data synthesis
Nian Ran, Bahrul Ilmi Nasution, Claire Little, Richard Allmendinger, Mark ElliotVFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication
Xun Yuan, Yang Yang, Prosanta Gope, Aryan Pasikhani, Biplab SikdarUnveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model
Hyunsoo Cho
Towards Sim-to-Real Industrial Parts Classification with Synthetic Dataset
Xiaomeng Zhu, Talha Bilal, Pär Mårtensson, Lars Hanson, Mårten Björkman, Atsuto MakiScalability in Building Component Data Annotation: Enhancing Facade Material Classification with Synthetic Data
Josie Harrison, Alexander Hollberg, Yinan Yu