Long Sequence
Long sequence modeling focuses on efficiently processing and understanding data with extremely long temporal or spatial dependencies, a challenge for many machine learning models. Current research emphasizes developing novel architectures, such as state-space models (SSMs) like Mamba and its variants, and adapting existing models like Transformers through techniques like sparse attention and efficient caching mechanisms to handle these sequences. These advancements are crucial for improving performance in various applications, including natural language processing, speech recognition, and image analysis, where long-range dependencies are critical for accurate interpretation and generation. The ultimate goal is to create models that can effectively handle arbitrarily long sequences while maintaining computational efficiency and accuracy.
Papers
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi, Michele Casoni, Alessandro Betti, Tommaso Guidi, Marco Gori, Stefano Melacci
Lissard: Long and Simple Sequential Reasoning Datasets
Mirelle Bueno, Roberto Lotufo, Rodrigo Nogueira
XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event Prediction
Tingsong Xiao, Zelin Xu, Wenchong He, Jim Su, Yupu Zhang, Raymond Opoku, Ronald Ison, Jason Petho, Jiang Bian, Patrick Tighe, Parisa Rashidi, Zhe Jiang
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi