Online Knowledge Distillation

Online knowledge distillation (OKD) is a machine learning technique that improves the efficiency and performance of smaller "student" models by training them collaboratively with larger "teacher" models, eliminating the need for a pre-trained teacher. Current research focuses on optimizing knowledge transfer mechanisms, particularly addressing challenges like model homogenization and efficient knowledge representation across various architectures, including convolutional neural networks (CNNs), vision transformers (ViTs), and graph neural networks (GNNs), often employing techniques like contrastive learning and attention mechanisms. OKD's significance lies in its potential to reduce computational costs and improve the performance of models deployed on resource-constrained devices, impacting fields ranging from computer vision and natural language processing to reinforcement learning and personalized education.

Papers