Dataset Distillation
Dataset distillation aims to create smaller, synthetic datasets that retain the essential information of much larger original datasets, thereby reducing computational costs and storage needs for training deep learning models. Current research focuses on improving the efficiency and generalizability of these synthetic datasets across different model architectures, often employing techniques like knowledge distillation, generative models (e.g., diffusion models, GANs), and trajectory matching to achieve this. This field is significant because it addresses the growing challenges of data size and computational expense in deep learning, with potential applications ranging from improving training efficiency to enabling data sharing in resource-constrained environments like TinyML and medical imaging.
Papers
Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Xinhao Zhong, Shuoyang Sun, Xulin Gu, Zhaoyang Xu, Yaowei Wang, Jianlong Wu, Bin Chen
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information
Xinhao Zhong, Bin Chen, Hao Fang, Xulin Gu, Shu-Tao Xia, En-Hui Yang
Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning
Brian B. Moser, Federico Raue, Tobias C. Nauen, Stanislav Frolov, Andreas Dengel
Color-Oriented Redundancy Reduction in Dataset Distillation
Bowen Yuan, Zijian Wang, Mahsa Baktashmotlagh, Yadan Luo, Zi Huang
Dataset Distillers Are Good Label Denoisers In the Wild
Lechao Cheng, Kaifeng Chen, Jiyang Li, Shengeng Tang, Shufei Zhang, Meng Wang