Long Context LLM
Long-context LLMs aim to overcome the limitations of traditional LLMs by processing significantly longer input sequences, enabling more comprehensive understanding and generation of text. Current research focuses on improving efficiency through techniques like optimized attention mechanisms (e.g., sparse attention, hierarchical pruning), efficient key-value cache management (e.g., quantization, eviction strategies), and data-driven approaches to enhance long-context performance during training and fine-tuning. These advancements are crucial for enabling applications requiring the processing of extensive textual data, such as complex question answering, document summarization, and large-scale information retrieval, while addressing the computational challenges associated with increased context length.
Papers
Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study
Mingyang Song, Mao Zheng, Xuan Luo
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang