Optimal Transport Distillation

Optimal transport distillation is a technique for efficiently transferring knowledge from a large, computationally expensive model (the "teacher") to a smaller, faster model (the "student"). Research focuses on improving the accuracy and efficiency of this transfer, particularly for tasks like image generation, code generation, and cross-lingual information retrieval, employing methods such as pairwise sample optimization and backward distillation to address challenges like blurry outputs and training-inference discrepancies. This approach is significant because it allows for deploying high-performing models with reduced computational cost and improved accessibility, impacting various fields by enabling faster inference and personalized learning in resource-constrained environments.

Papers