Long Context Language
Long-context language models (LLMs) aim to process significantly longer input sequences than traditional models, enabling more comprehensive understanding and generation of text. Current research focuses on developing more effective training methods, evaluating model performance across diverse and realistic tasks beyond simple retrieval benchmarks, and improving the efficiency of attention mechanisms to handle extremely long contexts. This field is crucial for advancing natural language processing capabilities in applications requiring extensive contextual information, such as complex question answering, document summarization, and code generation.
Papers
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation
Kaijian Zou, Muhammad Khalifa, Lu Wang
LongSafetyBench: Long-Context LLMs Struggle with Safety Issues
Mianqiu Huang, Xiaoran Liu, Shaojun Zhou, Mozhi Zhang, Chenkun Tan, Pengyu Wang, Qipeng Guo, Zhe Xu, Linyang Li, Zhikai Lei, Linlin Li, Qun Liu, Yaqian Zhou, Xipeng Qiu, Xuanjing Huang
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao, Sam Ade Jacobs, Masahiro Tanaka, Olatunji Ruwase, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda
MemLong: Memory-Augmented Retrieval for Long Text Modeling
Weijie Liu, Zecheng Tang, Juntao Li, Kehai Chen, Min Zhang