Sentence Level Distillation

Sentence-level distillation is a machine learning technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model, focusing on aligning the models' overall sentence representations rather than individual word tokens. Current research explores optimizing this process through techniques like self-supervised learning, particularly within transformer architectures, and investigates the interplay between sentence-level and token-level distillation approaches, sometimes combining them for improved performance. This methodology is crucial for compressing large language models, enabling their deployment on resource-constrained devices and facilitating multilingual applications, especially for low-resource languages where data is scarce.

Papers