Data Pruning

Data pruning is a technique for efficiently training machine learning models by selectively removing less informative data points from large datasets. Current research focuses on developing effective pruning metrics and algorithms, often leveraging language models, importance sampling, and clustering techniques, to identify and remove redundant or noisy data while preserving model accuracy and robustness across various tasks, including image classification, natural language processing, and molecular modeling. This approach significantly reduces training time and computational costs, impacting both the scalability of deep learning research and the deployment of resource-constrained applications.

Papers

February 23, 2023

Less is More: Data Pruning for Faster Adversarial Training
Yize Li, Pu Zhao, Xue Lin, Bhavya Kailkhura, Ryan Goldhahn
Deep Neural Network Adversarial Example Adversarial Training Data Pruning Deep Learning Acceleration

February 14, 2023

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms
Fadhel Ayed, Soufiane Hayou
Fundamental Limitation Score Based Data Pruning Neural Scaling Law Random Pruning

December 20, 2022

Calibrating Deep Neural Networks using Explicit Regularisation and Dynamic Data Pruning
Ramya Hebbalaguppe, Rishabh Patra, Tirtharaj Dash, Gautam Shroff, Lovekesh Vig
Deep Learning Model Calibration Data Pruning Explicit Regularization Confidence Aware

July 1, 2022

Efficient Adversarial Training With Data Pruning
Maximilian Kaufmann, Yiren Zhao, Ilia Shumailov, Robert Mullins, Nicolas Papernot
Adversarial Example Adversarial Training Adversarial Sample Data Pruning Adversarial Evaluation Efficient Adversarial Training

June 29, 2022

Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos
Large Scale Multiplicative Size Scaling Data Pruning Neural Scaling Law Power Law Scaling

November 24, 2021

Accelerating Deep Learning with Dynamic Data Pruning
Ravi S Raju, Kyle Daruwalla, Mikko Lipasti
Deep Learning Score Based Data Pruning Random Pruning