Data Pruning
Data pruning is a technique for efficiently training machine learning models by selectively removing less informative data points from large datasets. Current research focuses on developing effective pruning metrics and algorithms, often leveraging language models, importance sampling, and clustering techniques, to identify and remove redundant or noisy data while preserving model accuracy and robustness across various tasks, including image classification, natural language processing, and molecular modeling. This approach significantly reduces training time and computational costs, impacting both the scalability of deep learning research and the deployment of resource-constrained applications.
Papers
Dynamic Data Pruning for Automatic Speech Recognition
Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, Boqian Wu, Lu Yin, Stavros Petridis, Mykola Pechenizkiy, Maja Pantic, Decebal Constantin Mocanu, Shiwei Liu
FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning
Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga