Data Pruning
Data pruning is a technique for efficiently training machine learning models by selectively removing less informative data points from large datasets. Current research focuses on developing effective pruning metrics and algorithms, often leveraging language models, importance sampling, and clustering techniques, to identify and remove redundant or noisy data while preserving model accuracy and robustness across various tasks, including image classification, natural language processing, and molecular modeling. This approach significantly reduces training time and computational costs, impacting both the scalability of deep learning research and the deployment of resource-constrained applications.
Papers
May 29, 2024
May 10, 2024
April 8, 2024
March 12, 2024
December 21, 2023
December 5, 2023
November 2, 2023
October 23, 2023
October 11, 2023
September 21, 2023
September 8, 2023
August 2, 2023
June 25, 2023
June 5, 2023
May 28, 2023
March 26, 2023
March 8, 2023
February 23, 2023
February 14, 2023