Knowledge Distillation
Knowledge distillation is a machine learning technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model, aiming to improve the student's performance and reduce computational costs. Current research focuses on improving distillation methods for various model architectures, including convolutional neural networks, transformers, and large language models, often incorporating techniques like parameter-efficient fine-tuning, multi-task learning, and data augmentation to enhance knowledge transfer. This approach is significant because it enables the deployment of high-performing models on resource-constrained devices and addresses challenges related to model size, training time, and privacy in diverse applications such as image captioning, speech processing, and medical diagnosis.
Papers - Page 8
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Self-Evolution Knowledge Distillation for LLM-based Machine Translation
SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student Performance