Data Corruption

Data corruption, encompassing errors and inconsistencies in datasets, poses a significant challenge across diverse machine learning applications. Current research focuses on developing robust algorithms and model architectures, such as those based on sequence modeling and robust statistical methods (e.g., Huber loss, quantile estimators), to mitigate the impact of corrupted data on model performance and reliability. This work is crucial for improving the trustworthiness and generalizability of AI systems across various domains, from reinforcement learning and hypothesis testing to natural language processing and building energy assessment, where data quality significantly impacts the accuracy and utility of results.

Papers