Long Sequence
Long sequence modeling focuses on efficiently processing and understanding data with extremely long temporal or spatial dependencies, a challenge for many machine learning models. Current research emphasizes developing novel architectures, such as state-space models (SSMs) like Mamba and its variants, and adapting existing models like Transformers through techniques like sparse attention and efficient caching mechanisms to handle these sequences. These advancements are crucial for improving performance in various applications, including natural language processing, speech recognition, and image analysis, where long-range dependencies are critical for accurate interpretation and generation. The ultimate goal is to create models that can effectively handle arbitrarily long sequences while maintaining computational efficiency and accuracy.
Papers
Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems
Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri Narayana Patro, Vijay Srinivas Agneeswaran
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov