Data Centric Learning
Data-centric learning prioritizes improving the quality and utility of datasets to enhance machine learning model performance, rather than solely focusing on model architecture. Current research emphasizes techniques like curriculum learning to optimize training order, dataset condensation to create smaller, representative datasets, and methods to leverage unlabeled data effectively, often employing diffusion models. This approach is proving valuable across diverse applications, from improving large language models and image recognition to enhancing the accuracy and reliability of machine learning in earth observation and entity resolution by bridging the gap between training and real-world data.
Papers
November 11, 2024
May 13, 2024
April 21, 2024
December 8, 2023
November 6, 2023
March 17, 2023
November 20, 2021