Data Diet

"Data diet" research explores optimizing machine learning model training by selectively pruning training datasets, aiming to improve efficiency and performance without sacrificing accuracy. Current research focuses on developing effective data pruning strategies, often employing gradient-based metrics to identify and remove less informative or even detrimental data points, across diverse applications including medical image segmentation, natural language processing, and bias mitigation. This approach holds significant promise for reducing computational costs, improving model generalization, and mitigating biases in various machine learning applications.

Papers