Dataset Distillation
Dataset distillation aims to create smaller, synthetic datasets that retain the essential information of much larger original datasets, thereby reducing computational costs and storage needs for training deep learning models. Current research focuses on improving the efficiency and generalizability of these synthetic datasets across different model architectures, often employing techniques like knowledge distillation, generative models (e.g., diffusion models, GANs), and trajectory matching to achieve this. This field is significant because it addresses the growing challenges of data size and computational expense in deep learning, with potential applications ranging from improving training efficiency to enabling data sharing in resource-constrained environments like TinyML and medical imaging.
Papers
QuickDrop: Efficient Federated Unlearning by Integrated Dataset Distillation
Akash Dhasade, Yaohong Ding, Song Guo, Anne-marie Kermarrec, Martijn De Vos, Leijie Wu
Dataset Distillation in Latent Space
Yuxuan Duan, Jianfu Zhang, Liqing Zhang
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen