Data Generation
Data generation is a rapidly evolving field focused on creating artificial datasets to address limitations in real-world data acquisition, such as cost, privacy concerns, and scarcity. Current research emphasizes using large language models (LLMs) and diffusion models to generate diverse and realistic synthetic data for various applications, including training machine learning models for tasks like image recognition, natural language processing, and anomaly detection. This work is crucial for advancing AI research and development in areas where obtaining sufficient real-world data is challenging, ultimately leading to improved model performance and broader applicability across diverse scientific and practical domains.
Papers
Improved Data Generation for Enhanced Asset Allocation: A Synthetic Dataset Approach for the Fixed Income Universe
Szymon Kubiak, Tillman Weyde, Oleksandr Galkin, Dan Philps, Ram Gopal
Data Generation for Post-OCR correction of Cyrillic handwriting
Evgenii Davydkin, Aleksandr Markelov, Egor Iuldashev, Anton Dudkin, Ivan Krivorotov
Diffusion model based data generation for partial differential equations
Rucha Apte, Sheel Nidhan, Rishikesh Ranade, Jay Pathak
$\texttt{causalAssembly}$: Generating Realistic Production Data for Benchmarking Causal Discovery
Konstantin Göbler, Tobias Windisch, Mathias Drton, Tim Pychynski, Steffen Sonntag, Martin Roth