Logit Distillation

Logit distillation is a knowledge distillation technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model by focusing on the output logits (pre-softmax probabilities). Current research emphasizes improving the accuracy and efficiency of this process, exploring variations like ranking-based losses, adaptive knowledge transfer, and refined logits to address limitations of traditional Kullback-Leibler divergence-based methods. These advancements are significant because they enable the deployment of high-performing models on resource-constrained devices and improve the interpretability and efficiency of training large models across various domains, including image classification, natural language processing, and even federated learning settings.

Papers