Context Extrapolation

Context extrapolation in large language models (LLMs) focuses on enabling these models to effectively process and reason over significantly longer input sequences than their inherent context window allows. Current research explores novel architectures and training-free methods, such as enhanced embedding techniques and memory-based approaches, to improve the accuracy and efficiency of handling extended contexts, mitigating issues like the "lost-in-the-middle" effect and hallucinations. These advancements are crucial for improving the performance of LLMs in real-world applications requiring the processing of extensive documents or continuous streams of information, such as question answering and LLM-driven agents. The ultimate goal is to create more robust and reliable LLMs capable of handling complex tasks that demand a broader contextual understanding.

Papers

December 10, 2024

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
Haoran Lian, Junmin Chen, Wei Huang, Yizhe Xiong, Wenping Hu, Guiguang Ding, Hui Chen, Jianwei Niu, Zijia Lin, Fuzheng Zhang, Di Zhang
Large Language Model Language Model Long Context Continual Pre Training Long Span Knowledge Barrier Long Context Task Single Stage Context Extrapolation

July 4, 2024

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction
Amanda Dsouza, Christopher Glaze, Changho Shin, Frederic Sala
Large Language Model Level Test Inference Time Working Memory Context Extrapolation QA Task

February 22, 2024

Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer
Xinshuo Hu, Baotian Hu, Dongfang Li, Xiaoguang Li, Lifeng Shang
General Analysis Context Information Generative Question Synchronous Generator Contextual Knowledge Dynamic Knowledge Context Extrapolation Generative Fidelity

February 18, 2024

February 7, 2024

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
Large Language Model Long Sequence Long Range Dependency Context Extrapolation

January 31, 2024

Mitigating the Influence of Distractor Tasks in LMs with Prior-Aware Decoding
Raymond Douglas, Andis Draguns, Tomáš Gavenčiak
Language Model Instruction Following Mixture Model Right Problem Prior Model Context Extrapolation

September 15, 2023

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
Shiyi Zhu, Jing Ye, Wei Jiang, Siqiao Xue, Qi Zhang, Yifan Wu, Jianguo Li
Transformer Megatron Decepticons Transformer Based Large Language Model Product Specific Position Information Context Window Context Extrapolation Linear Angular Attention

October 11, 2022

Decoupled Context Processing for Context Augmented Language Modeling
Zonglin Li, Ruiqi Guo, Sanjiv Kumar
Language Model Encoder Decoder Architecture Context Retrieval Contextual Language Model Context Extrapolation

February 17, 2022

CSCNet: Contextual Semantic Consistency Network for Trajectory Prediction in Crowded Spaces
Beihao Xia, Conghao Wong, Qinmu Peng, Wei Yuan, Xinge You
Trajectory Prediction Handwritten Trajectory Crowded Environment Context Extrapolation Action Semantics

Context Extrapolation

Papers

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Extensible Embedding: A Flexible Multipler For LLM's Context Length

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Mitigating the Influence of Distractor Tasks in LMs with Prior-Aware Decoding

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending

Decoupled Context Processing for Context Augmented Language Modeling

CSCNet: Contextual Semantic Consistency Network for Trajectory Prediction in Crowded Spaces