Data Cleaning
Data cleaning aims to improve the quality and reliability of datasets used in machine learning and other data-driven applications by identifying and correcting errors, inconsistencies, and redundancies. Current research emphasizes efficient and scalable methods, including the use of neural networks, ensemble techniques, and large language models (LLMs) for tasks like outlier detection, label correction, and handling missing data. These advancements are crucial for enhancing the performance and trustworthiness of machine learning models across diverse fields, from climate science and medicine to natural language processing and code generation, ultimately leading to more reliable and impactful applications.
Papers
June 21, 2024
June 2, 2024
May 28, 2024
April 15, 2024
March 4, 2024
February 26, 2024
February 19, 2024
February 13, 2024
November 25, 2023
November 11, 2023
October 3, 2023
September 28, 2023
July 25, 2023
July 14, 2023
May 26, 2023
March 29, 2023
February 27, 2023
February 9, 2023
December 19, 2022