Visual State Space Model
Visual state space models (VSSMs) are a class of deep learning architectures designed to efficiently process sequential data, such as image patches, by leveraging the strengths of recurrent neural networks and latent variable models. Current research focuses on improving VSSMs' performance and efficiency in computer vision tasks, particularly through novel scanning strategies (e.g., fractal, atrous, windowed, and multi-scale scans) within model architectures like Mamba and its variants (e.g., ZeroMamba, GroupMamba). This approach offers a compelling alternative to computationally expensive transformers, with applications ranging from image classification and object detection to more specialized tasks like image deblurring, crowd counting, and remote sensing image analysis.
Papers
Exploring Robustness of Visual State Space model against Backdoor Attacks
Cheng-Yi Lee, Cheng-Chang Tsai, Chia-Mu Yu, Chun-Shien Lu
MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering
Yonglin Tian, Songlin Bai, Zhiyao Luo, Yutong Wang, Yisheng Lv, Fei-Yue Wang
Scalable Visual State Space Model with Fractal Scanning
Lv Tang, HaoKe Xiao, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Bo Li
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Ming-Hsuan Yang, Jinshan Pan
Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model
Yuheng Shi, Minjing Dong, Chang Xu