Paper ID: 2503.23455 • Published Mar 30, 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao, Yang Shen, Jinyang Guo, Yazhou Yao, Xiansheng Hua
Nanjing University of Science and Technology•Beihang University•Terminus Group•University of Electronic Science and Technology of China
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Token compression is essential for reducing the computational and memory
requirements of transformer models, enabling their deployment in
resource-constrained environments. In this work, we propose an efficient and
hardware-compatible token compression method called Prune and Merge. Our
approach integrates token pruning and merging operations within transformer
models to achieve layer-wise token compression. By introducing trainable merge
and reconstruct matrices and utilizing shortcut connections, we efficiently
merge tokens while preserving important information and enabling the
restoration of pruned tokens. Additionally, we introduce a novel
gradient-weighted attention scoring mechanism that computes token importance
scores during the training phase, eliminating the need for separate
computations during inference and enhancing compression efficiency. We also
leverage gradient information to capture the global impact of tokens and
automatically identify optimal compression structures. Extensive experiments on
the ImageNet-1k and ADE20K datasets validate the effectiveness of our approach,
achieving significant speed-ups with minimal accuracy degradation compared to
state-of-the-art methods. For instance, on DeiT-Small, we achieve a
1.64\times speed-up with only a 0.2\% drop in accuracy on ImageNet-1k.
Moreover, by compressing segmenter models and comparing with existing methods,
we demonstrate the superior performance of our approach in terms of efficiency
and effectiveness. Code and models have been made available at
this https URL
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.