Token Pruning
Token pruning aims to accelerate the inference speed and reduce the memory footprint of large transformer models, particularly Vision Transformers (ViTs) and multimodal models, by selectively removing less important tokens. Current research focuses on developing efficient pruning strategies, often incorporating attention mechanisms or other importance scores to identify tokens for removal, and exploring techniques like token merging or compensation to mitigate performance loss. These advancements are significant because they enable the deployment of powerful, computationally expensive models on resource-constrained devices and improve the efficiency of large-scale model training and inference.
37papers
Papers
March 30, 2025
March 24, 2025
March 14, 2025
March 10, 2025
March 9, 2025
February 20, 2025
February 17, 2025
January 23, 2025
December 31, 2024
December 30, 2024
December 28, 2024