Data Cleaning

Data cleaning aims to improve the quality and reliability of datasets used in machine learning and other data-driven applications by identifying and correcting errors, inconsistencies, and redundancies. Current research emphasizes efficient and scalable methods, including the use of neural networks, ensemble techniques, and large language models (LLMs) for tasks like outlier detection, label correction, and handling missing data. These advancements are crucial for enhancing the performance and trustworthiness of machine learning models across diverse fields, from climate science and medicine to natural language processing and code generation, ultimately leading to more reliable and impactful applications.

Papers

May 4, 2022

Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets
Darwin Quezada-Gaibor, Lucie Klus, Joaquín Torres-Sospedra, Elena Simona Lohan, Jari Nurmi, Carlos Granell, Joaquín Huerta
Indoor Positioning Data Cleaning Position Prediction Wi Fi Signal Strength

February 15, 2022

Toxic Comments Hunter : Score Severity of Toxic Comments
Zhichang Wang, Qipeng Zhu
BERT Model Online Comment Severity Prediction Toxic Comment Data Cleaning

Data Cleaning

Papers

Data Cleansing for Indoor Positioning Wi-Fi Fingerprinting Datasets

Toxic Comments Hunter : Score Severity of Toxic Comments