Long Context
Long context in large language models (LLMs) focuses on enhancing the ability of these models to process and reason over significantly extended input sequences, exceeding the limitations of traditional context windows. Current research emphasizes developing novel attention mechanisms (e.g., sparse attention, differential attention) and efficient memory management techniques (e.g., compression, retrieval-augmentation) to overcome computational and memory bottlenecks associated with longer contexts. This area is crucial for advancing LLMs' capabilities in complex tasks requiring holistic understanding of extensive information, such as question answering, summarization, and multi-modal reasoning, impacting both scientific understanding of LLMs and their practical applications.
Papers
Folded context condensation in Path Integral formalism for infinite context transformers
Won-Gi Paeng, Daesuk Kwon
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference
Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu
Long Context Alignment with Short Instructions and Synthesized Positions
Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li