Long Sequence
Long sequence modeling focuses on efficiently processing and understanding data with extremely long temporal or spatial dependencies, a challenge for many machine learning models. Current research emphasizes developing novel architectures, such as state-space models (SSMs) like Mamba and its variants, and adapting existing models like Transformers through techniques like sparse attention and efficient caching mechanisms to handle these sequences. These advancements are crucial for improving performance in various applications, including natural language processing, speech recognition, and image analysis, where long-range dependencies are critical for accurate interpretation and generation. The ultimate goal is to create models that can effectively handle arbitrarily long sequences while maintaining computational efficiency and accuracy.
Papers
Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers
Patrik Zavoral, Dušan Variš, Ondřej Bojar
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid
SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning
Yan Zhong, Ruoyu Zhao, Chao Wang, Qinghai Guo, Jianguo Zhang, Zhichao Lu, Luziwei Leng
PRF: Parallel Resonate and Fire Neuron for Long Sequence Learning in Spiking Neural Networks
Yulong Huang, Zunchang Liu, Changchun Feng, Xiaopeng Lin, Hongwei Ren, Haotian Fu, Yue Zhou, Hong Xing, Bojun Cheng
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren, Zhicong Li, Yong Liu
Exploring Learnability in Memory-Augmented Recurrent Neural Networks: Precision, Stability, and Empirical Insights
Shrabon Das, Ankur Mali