Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
GPT-FL: Generative Pre-trained Model-Assisted Federated Learning
Tuo Zhang, Tiantian Feng, Samiul Alam, Dimitrios Dimitriadis, Sunwoo Lee, Mi Zhang, Shrikanth S. Narayanan, Salman Avestimehr
Word-Level Explanations for Analyzing Bias in Text-to-Image Models
Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
Generative Adversarial Networks for Data Augmentation
Angona Biswas, MD Abdullah Al Nasim, Al Imran, Anika Tabassum Sejuty, Fabliha Fairooz, Sai Puppala, Sajedul Talukder
Harnessing large-language models to generate private synthetic text
Alexey Kurakin, Natalia Ponomareva, Umar Syed, Liam MacDermed, Andreas Terzis
Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models
Virginia Fernandez, Pedro Sanchez, Walter Hugo Lopez Pinaya, Grzegorz Jacenków, Sotirios A. Tsaftaris, Jorge Cardoso