Data Generation
Data generation is a rapidly evolving field focused on creating artificial datasets to address limitations in real-world data acquisition, such as cost, privacy concerns, and scarcity. Current research emphasizes using large language models (LLMs) and diffusion models to generate diverse and realistic synthetic data for various applications, including training machine learning models for tasks like image recognition, natural language processing, and anomaly detection. This work is crucial for advancing AI research and development in areas where obtaining sufficient real-world data is challenging, ultimately leading to improved model performance and broader applicability across diverse scientific and practical domains.
Papers
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Jiuxiang Gu, Qixing Huang, Georgios Pavlakos, Hao Tan
RadField3D: A Data Generator and Data Format for Deep Learning in Radiation-Protection Dosimetry for Medical Applications
Felix Lehner, Pasquale Lombardo, Susana Castillo, Oliver Hupe, Marcus Magnor