Data Synthesis
Data synthesis focuses on generating artificial datasets that mimic the statistical properties and structure of real-world data, primarily to address data scarcity, privacy concerns, and the need for diverse training data in machine learning. Current research emphasizes the synthesis of complex data types, including relational databases and time series, often employing generative models like diffusion models and large language models (LLMs) to achieve high fidelity and utility. These techniques are proving valuable in various applications, from improving the performance of large language models and vision systems to enhancing medical image analysis and enabling privacy-preserving data sharing. The field is also actively developing robust evaluation metrics and methods to ensure the quality and reliability of synthetic data.
Papers - Page 3
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving
Sihao Wu, Jiaxu Liu, Xiangyu Yin, Guangliang Cheng, Meng Fang, Xingyu Zhao, Xinping Yi, Xiaowei HuangMastering the Craft of Data Synthesis for CodeLLMs
Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen+8