Real Program Distribution

Real program distribution research focuses on accurately representing the characteristics of programs encountered in real-world scenarios, improving the evaluation and development of software tools like code generation models and scientific simulators. Current efforts concentrate on creating more realistic benchmarks that reflect diverse programming languages, project scales, and dependency structures, often employing maximum entropy methods to handle uncertainty in parameter estimation and leveraging techniques like sliced-Wasserstein distance for comparing distributions. This work is crucial for advancing the reliability and performance of AI-powered code generation and for enhancing the accuracy and robustness of scientific simulations by providing more realistic and representative input distributions.

Papers