Paper ID: 2310.11546

Bias and Error Mitigation in Software-Generated Data: An Advanced Search and Optimization Framework Leveraging Generative Code Models

Ernesto Giralt Hernández

Data generation and analysis is a fundamental aspect of many industries and disciplines, from strategic decision making in business to research in the physical and social sciences. However, data generated using software and algorithms can be subject to biases and errors. These can be due to problems with the original software, default settings that do not align with the specific needs of the situation, or even deeper problems with the underlying theories and models. This paper proposes an advanced search and optimization framework aimed at generating and choosing optimal source code capable of correcting errors and biases from previous versions to address typical problems in software systems specializing in data analysis and generation, especially those in the corporate and data science world. Applying this framework multiple times on the same software system would incrementally improve the quality of the output results. It uses Solomonoff Induction as a sound theoretical basis, extending it with Kolmogorov Conditional Complexity, a novel adaptation, to evaluate a set of candidate programs. We propose the use of generative models for the creation of this set of programs, with special emphasis on the capabilities of Large Language Models (LLMs) to generate high quality code.

Submitted: Oct 17, 2023