Cross Modal Knowledge Distillation
Cross-modal knowledge distillation (CMKD) aims to transfer knowledge learned from a data-rich "teacher" modality (e.g., LiDAR, audio) to a data-scarce "student" modality (e.g., camera, text), improving the student's performance without requiring extensive data for direct training. Current research focuses on addressing modality mismatches through techniques like disentanglement learning, adversarial training, and various alignment strategies (e.g., feature, output, or label-level alignment) within teacher-student architectures. CMKD's significance lies in its ability to leverage readily available data from one modality to enhance performance in another, leading to more efficient and robust models across diverse applications such as autonomous driving, speech recognition, and multi-modal learning.