Long Context LLM
Long-context LLMs aim to overcome the limitations of traditional LLMs by processing significantly longer input sequences, enabling more comprehensive understanding and generation of text. Current research focuses on improving efficiency through techniques like optimized attention mechanisms (e.g., sparse attention, hierarchical pruning), efficient key-value cache management (e.g., quantization, eviction strategies), and data-driven approaches to enhance long-context performance during training and fine-tuning. These advancements are crucial for enabling applications requiring the processing of extensive textual data, such as complex question answering, document summarization, and large-scale information retrieval, while addressing the computational challenges associated with increased context length.
Papers
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu