Knowledge Distillation
Knowledge distillation is a machine learning technique that transfers knowledge from a large, complex "teacher" model to a smaller, more efficient "student" model, aiming to improve the student's performance and reduce computational costs. Current research focuses on improving distillation methods for various model architectures, including convolutional neural networks, transformers, and large language models, often incorporating techniques like parameter-efficient fine-tuning, multi-task learning, and data augmentation to enhance knowledge transfer. This approach is significant because it enables the deployment of high-performing models on resource-constrained devices and addresses challenges related to model size, training time, and privacy in diverse applications such as image captioning, speech processing, and medical diagnosis.
Papers
InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation
Jinbin Huang, Wenbin He, Liang Gou, Liu Ren, Chris Bryan
Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge
John Violos, Symeon Papadopoulos, Ioannis Kompatsiaris
Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels
Nicholas Pangakis, Samuel Wolken
Dual-Space Knowledge Distillation for Large Language Models
Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu
Leveraging Knowledge Distillation for Lightweight Skin Cancer Classification: Balancing Accuracy and Computational Efficiency
Niful Islam, Khan Md Hasib, Fahmida Akter Joti, Asif Karim, Sami Azam
The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation
Haryo Akbarianto Wibowo, Thamar Solorio, Alham Fikri Aji
Factual Dialogue Summarization via Learning from Large Language Models
Rongxin Zhu, Jey Han Lau, Jianzhong Qi
Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study
Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov, Mikhail Seleznyov, Vasiliy Viskov, Alexander Panchenko, Steffen Eger
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs
Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen
Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?
Nirjhor Rouf, Fin Amin, Paul D. Franzon
BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation
Minchong Li, Feng Zhou, Xiaohui Song
Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation
Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang
Mutual Learning for Finetuning Click-Through Rate Prediction Models
Ibrahim Can Yilmaz, Said Aldemir
Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment
Jie Li, Yi Liu, Chongyang Liu, Xiaoning Ren, Ling Shi, Weisong Sun, Yinxing Xue
Graph Knowledge Distillation to Mixture of Experts
Pavel Rumiantsev, Mark Coates