Attention Based Knowledge Distillation

Attention-based knowledge distillation (KD) aims to compress large, complex neural networks by training smaller "student" models to mimic the behavior of larger "teacher" models, focusing on transferring knowledge from intermediate layers using attention mechanisms. Current research explores various attention strategies, including spatial and frequency domain approaches, and their application across diverse architectures like convolutional neural networks (CNNs) and graph neural networks (GNNs), often incorporating techniques like contrastive learning. This technique is significant for improving efficiency and reducing computational costs in various applications, including image classification, object detection, and speech processing, by enabling the deployment of smaller, faster models without significant performance loss.

Papers

March 9, 2024

Frequency Attention for Knowledge Distillation
Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do
Knowledge Distillation Attention Based Knowledge Distillation Frequency Attention

October 24, 2023

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
Graph Neural Network Knowledge Distillation GNN Model Neural Network Compression Attention Based Knowledge Distillation

August 31, 2023

MoMA: Momentum Contrastive Learning with Multi-head Attention-based Knowledge Distillation for Histopathology Image Analysis
Trinh Thi Le Vuong, Jin Tae Kwak
Knowledge Distillation Computational Pathology Image Analysis Memory Augmentation Momentum Contrast Informative Representation Pathology Data Attention Based Knowledge Distillation

May 26, 2023

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression
Yixin Wan, Yuan Zhou, Xiulian Peng, Kai-Wei Chang, Yan Lu
Knowledge Distillation Deep Noise Suppression Noise Suppression Deep Noise Suppression Challenge Attention Based Knowledge Distillation

April 25, 2023

Class Attention Transfer Based Knowledge Distillation
Ziyao Guo, Haonan Yan, Hui Li, Xiaodong Lin
Knowledge Distillation Attention Based Knowledge Distillation

September 29, 2022

Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition
Sungho Shin, Joosoon Lee, Junseok Lee, Yeonguk Yu, Kyoobin Lee
Attention Map Low Resolution Face Recognition Benchmark Knowledge Distillation Loss Resolution Face Recognition Attention Based Knowledge Distillation

June 26, 2022

Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification
Jun-Teng Yang, Sheng-Che Kao, Scott C. -H. Huang
Knowledge Distillation Image Classification Attention Mechanism Model Compression Robust Deep Attention Based Knowledge Distillation

May 4, 2022

Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss
Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós, Juan C. SanMiguel
Convolutional Neural Network Knowledge Distillation Global Impact Attention Based Knowledge Distillation

Attention Based Knowledge Distillation

Papers

Frequency Attention for Knowledge Distillation

ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation

MoMA: Momentum Contrastive Learning with Multi-head Attention-based Knowledge Distillation for Histopathology Image Analysis

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

Class Attention Transfer Based Knowledge Distillation

Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition

Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification

Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss