Knowledge Distillation
Knowledge distillation is a machine learning technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model, aiming to improve the student's performance and reduce computational costs. Current research focuses on improving distillation methods for various model architectures, including convolutional neural networks, transformers, and large language models, often incorporating techniques like parameter-efficient fine-tuning, multi-task learning, and data augmentation to enhance knowledge transfer. This approach is significant because it enables the deployment of high-performing models on resource-constrained devices and addresses challenges related to model size, training time, and privacy in diverse applications such as image captioning, speech processing, and medical diagnosis.
Papers - Page 3
Knowledge Distillation with Training Wheels
Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing
Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation
CoT2Align: Cross-Chain of Thought Distillation via Optimal Transport Alignment for Language Models with Different Tokenizers
A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition