Paper ID: 2307.04468

Badgers: generating data quality deficits with Python

Julien Siebert, Daniel Seifert, Patricia Kelbert, Michael Kläs, Adam Trendowicz

Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgers

Submitted: Jul 10, 2023