Input Token

Input tokens are the fundamental units of information processed by large language models (LLMs), particularly in multimodal applications combining text and visual data. Current research focuses on optimizing token representation and processing, including developing efficient encoding methods for visual information (e.g., using 2D features as tokens) and employing techniques like pruning and dynamic token selection to improve model efficiency without sacrificing performance. These advancements are crucial for deploying LLMs in resource-constrained environments and enhancing their capabilities in tasks like autonomous driving and automated captioning, where efficient processing of large amounts of data is paramount.

Papers