K TOKEN
"K Token" research broadly explores the representation and utilization of information units ("tokens") in large language and multimodal models, aiming to improve efficiency, accuracy, and context understanding. Current research focuses on novel tokenization methods for diverse data types (text, images, video, audio), developing model architectures (like transformers) that effectively process these tokens, and evaluating their performance on various tasks including question answering, generation, and semantic understanding. This work is significant for advancing the capabilities of large models, enabling more efficient and accurate processing of complex information, and impacting applications ranging from natural language processing to computer vision.
Papers
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, Dongsoo Lee
Collaborative decoding of critical tokens for boosting factuality of large language models
Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun