Token Reduction

Token reduction aims to improve the efficiency of large language and vision models by selectively reducing the number of input tokens processed, without significantly sacrificing performance. Current research focuses on developing novel algorithms, often integrated into Vision Transformers (ViTs) and other transformer-based architectures, that identify and remove redundant or less informative tokens using techniques like pruning, merging, and voting mechanisms. These advancements are crucial for deploying large models on resource-constrained devices and for scaling up model capabilities while mitigating computational costs, impacting fields ranging from image recognition and video processing to natural language understanding.

Papers