Teacher Model
Teacher models are large, pre-trained models used in knowledge distillation to train smaller, more efficient student models while preserving performance. Current research focuses on improving the accuracy and efficiency of this knowledge transfer, exploring techniques like data augmentation, loss function optimization (e.g., MSE loss), and novel architectures such as multi-teacher and online distillation frameworks. This work is significant because it addresses the computational cost and resource limitations associated with deploying large language and vision models, enabling broader accessibility and application in various fields including object detection, natural language processing, and ecological monitoring.
Papers
Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models
Eddardaa Ben Loussaief, Hatem Rashwan, Mohammed Ayad, Mohammed Zakaria Hassan, Domenec Puig
DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma