Attention Distillation

Attention distillation is a machine learning technique focused on transferring knowledge from a larger, more complex "teacher" model to a smaller, more efficient "student" model by focusing on the attention mechanisms within the teacher. Current research emphasizes applications across diverse areas, including image classification, object detection, semantic segmentation, and even graph neural networks, often employing vision transformers (ViTs) and convolutional neural networks (CNNs). This approach is significant for model compression, improving performance on resource-constrained devices, enhancing robustness (e.g., against backdoor attacks), and addressing challenges in continual learning and anomaly detection.

Papers