Long Context
Long context in large language models (LLMs) focuses on enhancing the ability of these models to process and reason over significantly extended input sequences, exceeding the limitations of traditional context windows. Current research emphasizes developing novel attention mechanisms (e.g., sparse attention, differential attention) and efficient memory management techniques (e.g., compression, retrieval-augmentation) to overcome computational and memory bottlenecks associated with longer contexts. This area is crucial for advancing LLMs' capabilities in complex tasks requiring holistic understanding of extensive information, such as question answering, summarization, and multi-modal reasoning, impacting both scientific understanding of LLMs and their practical applications.
Papers
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling
Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
InAttention: Linear Context Scaling for Transformers
Joseph Eisner
FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding
Jingyang Deng, Zhengyang Shen, Boyang Wang, Lixin Su, Suqi Cheng, Ying Nie, Junfeng Wang, Dawei Yin, Jinwen Ma
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Lijie Yang, Zhihao Zhang, Zhuofu Chen, Zikun Li, Zhihao Jia
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, Jingbo Zhu
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang, Shan Dong, Yuhui Xu, Hanze Dong, Yalu Wang, Amrita Saha, Ee-Peng Lim, Caiming Xiong, Doyen Sahoo
ALR$^2$: A Retrieve-then-Reason Framework for Long-context Question Answering
Huayang Li, Pat Verga, Priyanka Sen, Bowen Yang, Vijay Viswanathan, Patrick Lewis, Taro Watanabe, Yixuan Su
MELODI: Exploring Memory Compression for Long Contexts
Yinpeng Chen, DeLesley Hutchins, Aren Jansen, Andrey Zhmoginov, David Racz, Jesper Andersen
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen, Tianyu Gao, Minmin Hou, Ke Ding, Daniel Fleischer, Peter Izasak, Moshe Wasserblat, Danqi Chen
How to Train Long-Context Language Models (Effectively)
Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen