Synthetic Data
Synthetic data generation aims to create artificial datasets that mimic the statistical properties of real-world data, addressing limitations like data scarcity, privacy concerns, and high annotation costs. Current research focuses on developing sophisticated generative models, including generative adversarial networks (GANs), energy-based models (EBMs), diffusion models, and masked language models, tailored to various data types (images, text, tabular data, audio). This rapidly evolving field significantly impacts diverse scientific domains and practical applications by enabling the training of robust machine learning models in situations where real data is insufficient or ethically problematic, ultimately improving model performance and expanding research possibilities.
Papers
Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes
Mehmet Kerem Turkcan, Yuyang Li, Chengbo Zang, Javad Ghaderi, Gil Zussman, Zoran Kostic
The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition
Andrea Atzori, Pietro Cosseddu, Gianni Fenu, Mirko Marras
Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation
Shiming Ge, Bochao Liu, Pengju Wang, Yong Li, Dan Zeng
LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion
Maximilian Mueller, Matthias Hein
ToolACE: Winning the Points of LLM Function Calling
Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi
Self-Improving Diffusion Models with Synthetic Data
Sina Alemohammad, Ahmed Imtiaz Humayun, Shruti Agarwal, John Collomosse, Richard Baraniuk
SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models
Guangxi Li, Yinsheng Song, Mingkai Zheng