Sentence Level Distillation

Sentence-level distillation is a machine learning technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model, focusing on aligning the models' overall sentence representations rather than individual word tokens. Current research explores optimizing this process through techniques like self-supervised learning, particularly within transformer architectures, and investigates the interplay between sentence-level and token-level distillation approaches, sometimes combining them for improved performance. This methodology is crucial for compressing large language models, enabling their deployment on resource-constrained devices and facilitating multilingual applications, especially for low-resource languages where data is scarce.

Papers

September 16, 2024

Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu, Takahiro Shinozaki
Supervised Fine Tuning Speech Representation Disentanglement Self Supervised Speech Representation Learning Sentence Level Distillation

April 23, 2024

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo
Knowledge Distillation Sentence Level Comprehensive Study Sentence Level Distillation

June 29, 2022

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices
Amit Chaulwar, Lukas Malik, Maciej Krajewski, Felix Reichel, Leif-Nissen Lundbæk, Michael Huth, Bartlomiej Matejczyk
Knowledge Distillation Fast Inference High Compression Vocabulary Size Sentence Level Distillation Large Ranking Model

May 25, 2022

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
Kevin Heffernan, Onur Çelebi, Holger Schwenk
Low Resource Language Multilingual Model Cross Lingual Transfer Monolingual Data Sentence Level Distillation Bitext Mining

Sentence Level Distillation

Papers

Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices

Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages