Data Subset Selection

Data subset selection aims to identify smaller, representative portions of large datasets that maintain the performance of models trained on the full dataset, thereby reducing computational costs and improving efficiency. Current research focuses on developing algorithms that generalize across different model architectures, perform well across a wide range of data reduction ratios, and leverage information-theoretic principles or gradient-based methods for more principled subset selection. These advancements are significant for accelerating model training, hyperparameter tuning, and active learning, impacting both the speed and cost-effectiveness of machine learning applications across various domains.

Papers