Token Pruning
Token pruning aims to accelerate the inference speed and reduce the memory footprint of large transformer models, particularly Vision Transformers (ViTs) and multimodal models, by selectively removing less important tokens. Current research focuses on developing efficient pruning strategies, often incorporating attention mechanisms or other importance scores to identify tokens for removal, and exploring techniques like token merging or compensation to mitigate performance loss. These advancements are significant because they enable the deployment of powerful, computationally expensive models on resource-constrained devices and improve the efficiency of large-scale model training and inference.
Papers
October 25, 2024
October 12, 2024
September 27, 2024
September 22, 2024
September 16, 2024
September 13, 2024
September 2, 2024
August 21, 2024
August 13, 2024
July 30, 2024
July 15, 2024
July 10, 2024
July 1, 2024
June 18, 2024
June 3, 2024
April 2, 2024
March 21, 2024
March 11, 2024
March 5, 2024