Data Centric Learning

Data-centric learning prioritizes improving the quality and utility of datasets to enhance machine learning model performance, rather than solely focusing on model architecture. Current research emphasizes techniques like curriculum learning to optimize training order, dataset condensation to create smaller, representative datasets, and methods to leverage unlabeled data effectively, often employing diffusion models. This approach is proving valuable across diverse applications, from improving large language models and image recognition to enhancing the accuracy and reliability of machine learning in earth observation and entity resolution by bridging the gap between training and real-world data.

Papers