Paper ID: 2409.00563
Sparse Mamba: Reinforcing Controllability In Structural State Space Models
Emadeldeen Hamdan, Hongyi Pan, Ahmet Enis Cetin
In this work, we introduce the concept of controllability and observability to the Mamba SSM's architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. The structured state space model (SSM) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models at small to medium scale. The Mamba SSMs architecture drops the need for attention layers or multilayer perception blocks in transformers. However, current Mamba models lack reinforcement of controllability in state-space equations for computing the $A$, $B$, $C$, and $D$ matrices at each time step, leading to increased complexity and computational costs. In this paper, we demonstrate a reduction of parameters in comparison to the first published Mamba and Mamba2. We showcase an improvement in perplexity by 5\% and a decrease in training time by 3\% after reinforcing controllability and observability on the original Mamba architecture in our proposed S-Mamba. The controllable $n \times n$ state matrix $A$ is sparse and it has only $n$ free parameters. Our novel approach will ensure a controllable system which will be the gate key for Mamba3.
Submitted: Aug 31, 2024