Vision Transformer Pruning

Vision transformer (ViT) pruning aims to reduce the computational cost and memory footprint of ViTs, while preserving their accuracy, by selectively removing less important components. Current research focuses on developing efficient pruning strategies across multiple dimensions (attention heads, neurons, and input tokens), employing techniques like matrix decomposition and progressive sparsity prediction, and addressing potential biases introduced by pruning. These advancements are significant because they enable the deployment of powerful ViT models on resource-constrained devices, broadening their applicability in various computer vision tasks.

Papers