Token Pruning
Token pruning aims to accelerate the inference speed and reduce the memory footprint of large transformer models, particularly Vision Transformers (ViTs) and multimodal models, by selectively removing less important tokens. Current research focuses on developing efficient pruning strategies, often incorporating attention mechanisms or other importance scores to identify tokens for removal, and exploring techniques like token merging or compensation to mitigate performance loss. These advancements are significant because they enable the deployment of powerful, computationally expensive models on resource-constrained devices and improve the efficiency of large-scale model training and inference.
Papers
January 4, 2025
December 31, 2024
December 30, 2024
December 28, 2024
December 20, 2024
December 17, 2024
December 16, 2024
December 2, 2024
December 1, 2024
November 30, 2024
October 25, 2024
October 12, 2024
September 27, 2024
September 22, 2024
September 16, 2024
September 13, 2024
September 2, 2024
August 21, 2024
August 13, 2024