Data Quality
Data quality, encompassing accuracy, completeness, consistency, and timeliness of data, is crucial for reliable machine learning model performance and trustworthy AI applications. Current research focuses on developing automated methods for detecting and correcting data quality issues, including techniques like synthetic data generation, data augmentation, and the application of machine learning models themselves to refine datasets (e.g., using smaller models to improve larger ones). These efforts are driven by the need to improve the accuracy and robustness of AI systems across diverse fields, from social sciences and finance to healthcare and particle physics, where high-quality data is essential for reliable insights and decision-making.
Papers
Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter
Mulugeta Weldezgina Asres, Christian Walter Omlin, Long Wang, David Yu, Pavel Parygin, Jay Dittmann, Georgia Karapostoli, Markus Seidel, Rosamaria Venditti, Luka Lambrecht, Emanuele Usai, Muhammad Ahmad, Javier Fernandez Menendez, Kaori Maeshima, the CMS-HCAL Collaboration
Exploring Dataset-Scale Indicators of Data Quality
Benjamin Feuer, Chinmay Hegde