Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
SelectLLM: Can LLMs Select Important Instructions to Annotate?
Ritik Sachin Parkar, Jaehyung Kim, Jong Inn Park, Dongyeop Kang
Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation
Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
AdvNF: Reducing Mode Collapse in Conditional Normalising Flows using Adversarial Learning
Vikas Kanaujia, Mathias S. Scheurer, Vipul Arora
Rethinking Personalized Federated Learning with Clustering-based Dynamic Graph Propagation
Jiaqi Wang, Yuzhong Chen, Yuhang Wu, Mahashweta Das, Hao Yang, Fenglong Ma
Importance-Aware Adaptive Dataset Distillation
Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama
Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter
Dongmyoung Lee, Wei Chen, Nicolas Rojas
Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection
Lucas Lange, Nils Wenzlitschke, Erhard Rahm
Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcare
Vibeke Binz Vallevik, Aleksandar Babic, Serena Elizabeth Marshall, Severin Elvatun, Helga Brøgger, Sharmini Alagaratnam, Bjørn Edwin, Narasimha Raghavan Veeraragavan, Anne Kjersti Befring, Jan Franz Nygård
Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model
Yuanming Li, Gwantae Kim, Jeong-gi Kwak, Bon-hwa Ku, Hanseok Ko
An Augmented Surprise-guided Sequential Learning Framework for Predicting the Melt Pool Geometry
Ahmed Shoyeb Raihan, Hamed Khosravi, Tanveer Hossain Bhuiyan, Imtiaz Ahmed
Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN
Muhammad Ali Farooq, Wang Yao, Michael Schukat, Mark A Little, Peter Corcoran