Direct Preference Knowledge Distillation

Direct Preference Knowledge Distillation (DPKD) is a technique for training smaller, more efficient language and vision-language models by learning from larger, more capable models. Instead of directly mimicking the teacher model's outputs, DPKD focuses on learning the teacher's preferences between different outputs, often using ranking losses or preference-based objective functions. Current research emphasizes improving the efficiency and accuracy of this preference learning, particularly for large language models (LLMs) and large vision-language models (LVLMs), addressing challenges like miscalibration and limited access to teacher model internals. This approach promises to make advanced AI capabilities more accessible by enabling the deployment of smaller, faster models with comparable performance.

Papers