Token Pruning
Token pruning aims to accelerate the inference speed and reduce the memory footprint of large transformer models, particularly Vision Transformers (ViTs) and multimodal models, by selectively removing less important tokens. Current research focuses on developing efficient pruning strategies, often incorporating attention mechanisms or other importance scores to identify tokens for removal, and exploring techniques like token merging or compensation to mitigate performance loss. These advancements are significant because they enable the deployment of powerful, computationally expensive models on resource-constrained devices and improve the efficiency of large-scale model training and inference.
Papers
July 1, 2024
June 18, 2024
June 3, 2024
April 2, 2024
March 21, 2024
March 11, 2024
March 5, 2024
December 2, 2023
October 26, 2023
October 3, 2023
September 28, 2023
June 26, 2023
June 12, 2023
June 8, 2023
May 29, 2023
May 27, 2023
May 21, 2023
April 21, 2023
April 12, 2023