Token Repetition
Token repetition in large language models (LLMs) and other transformer-based architectures is a significant research area focusing on identifying, mitigating, and leveraging its effects on model performance and efficiency. Current efforts involve developing methods to detect the source of repeated tokens, reducing redundancy through pruning and pooling techniques, and designing novel decoding strategies like parallel decoding to improve speed and reduce repetition during generation. Addressing token repetition is crucial for enhancing the efficiency, reliability, and safety of LLMs, impacting both the development of more resource-efficient models and the trustworthiness of their outputs.
Papers
July 5, 2024
May 4, 2024
April 18, 2024
March 12, 2024
October 3, 2023
July 4, 2023
May 24, 2023