Token Dropping
Token dropping is a technique used to accelerate the training and inference of large transformer models, primarily by selectively omitting less important tokens during computation. Current research focuses on optimizing token selection strategies, often leveraging attention mechanisms or semantic consistency measures within architectures like BERT and Vision Transformers (ViTs), to minimize performance loss while maximizing speed improvements. This approach offers significant potential for reducing the computational cost of training and deploying large language and vision-language models, thereby expanding accessibility to resource-constrained environments and enabling the development of more efficient applications.
Papers
August 20, 2024
May 24, 2023
April 17, 2023
November 17, 2022
October 10, 2022