Data Quality Issue

Data quality issues pose a significant challenge across diverse scientific domains, hindering the reliability and accuracy of data-driven applications. Current research focuses on automated detection and correction of problems like missing values, duplicates, inconsistencies, and outliers, often employing hybrid approaches combining statistical methods with machine learning algorithms to enhance both accuracy and explainability. This work is crucial for improving the trustworthiness of AI models and ensuring the validity of scientific findings, particularly in high-stakes applications like healthcare and environmental monitoring where data quality directly impacts decision-making. The development of tools and techniques for identifying and mitigating these issues is a key area of ongoing investigation.

Papers