Distilled Dataset

Dataset distillation aims to create significantly smaller, synthetic datasets that retain the essential information of much larger original datasets, enabling faster and more efficient training of machine learning models. Current research focuses on improving the quality and robustness of these distilled datasets, exploring techniques like matching-based methods, diffusion models, and the strategic use of soft labels to address issues such as class imbalance and cross-architecture generalization. This field is significant because it offers solutions to the computational and storage challenges posed by massive datasets, impacting areas like federated learning, resource-constrained applications, and model compression.

Papers

March 6, 2024

Latent Dataset Distillation with Diffusion Models
Brian B. Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov, Andreas Dengel
Diffusion Model Dataset Distillation Low Temperature Distillation Distilled Dataset Distillation Method

February 20, 2024

Improve Cross-Architecture Generalization on Dataset Distillation
Binglin Zhou, Linhao Zhong, Wentao Chen
Knowledge Distillation Dataset Distillation Pooling Layer Distilled Dataset Cross Architecture Generalization

December 6, 2023

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
Peng Sun, Bei Shi, Daiwei Yu, Tao Lin
Knowledge Distillation High Resolution Diversity Awareness Dataset Distillation Distilled Dataset

November 29, 2023

Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao, Zeyuan Yin, Muxin Zhou, Xindong Zhang, Zhiqiang Shen
Synthetic Dataset High Performing 1D ConvNet Tiny ImageNet Dataset Condensation Distilled Dataset Numerous Cutting Edge Backbone

November 27, 2023

November 13, 2023

Embarassingly Simple Dataset Distillation
Yunzhen Feng, Ramakrishna Vedantam, Julia Kempe
Dataset Distillation Distilled Dataset

September 8, 2023

Towards Mitigating Architecture Overfitting on Distilled Datasets
Xuyang Zhong, Chen Liu
Training Data Generalization Performance Dataset Distillation Distilled Dataset

July 24, 2023

Rethinking Data Distillation: Do Not Overlook Calibration
Dongyao Zhu, Bowen Lei, Jie Zhang, Yanbo Fang, Ruqi Zhang, Yiqun Xie, Dongkuan Xu
Calibration Performance Dataset Distillation Temperature Scaling Distilled Dataset Data Distillation Overconfident Prediction Masked Distillation

July 7, 2023

Distilled Pruning: Using Synthetic Data to Win the Lottery
Luke McDermott, Daniel Cummings
Neural Network Deep Learning Model Synthetic Data Lottery Ticket Distilled Dataset

January 31, 2023

Differentially Private Kernel Inducing Points using features from ScatterNets (DP-KIP-ScatterNet) for Privacy Preserving Data Distillation
Margarita Vinaroz, Mi Jung Park
Feature Wise Density Estimation Distilled Dataset Data Distillation Reduce SCATTER Pixel Distillation

October 30, 2022

Dataset Distillation via Factorization
Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang
Dataset Distillation Distilled Dataset Prime Factorization

June 1, 2022

Dataset Distillation using Neural Feature Regression
Yongchao Zhou, Ehsan Nezhadarya, Jimmy Ba
Synthetic Dataset Dataset Distillation Deep Regression Distilled Dataset Meta Gradient Meta Dataset

December 22, 2021

Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation
Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Yuxia Wang, Zongyao Li, Zhengzhe Yu, Zhanglin Wu, Yimeng Chen, Chang Su, Min Zhang, Lizhi Lei, shimin tao, Hao Yang
Self Distillation Distilled Dataset Sequence Level Knowledge Distillation Teaching Dimension Non Autoregressive Neural Machine Translation