Masked Distillation

Masked distillation is a technique in machine learning that improves the performance of smaller, "student" models by leveraging the knowledge of larger, more powerful "teacher" models. Current research focuses on refining masking strategies, often guided by attention mechanisms or incorporating asymmetric masking ratios between teacher and student, and applying this technique to various architectures, including vision transformers and convolutional neural networks, across diverse tasks like object detection, image classification, and video understanding. This approach offers a powerful method for model compression, enabling efficient deployment of high-performing models on resource-constrained devices while also advancing our understanding of knowledge transfer and model training dynamics.

Papers