Paper ID: 2410.03962

SpecSAR-Former: A Lightweight Transformer-based Network for Global LULC Mapping Using Integrated Sentinel-1 and Sentinel-2

Hao Yu, Gen Li, Haoyu Liu, Songyan Zhu, Wenquan Dong, Changjian Li

Recent approaches in remote sensing have increasingly focused on multimodal data, driven by the growing availability of diverse earth observation datasets. Integrating complementary information from different modalities has shown substantial potential in enhancing semantic understanding. However, existing global multimodal datasets often lack the inclusion of Synthetic Aperture Radar (SAR) data, which excels at capturing texture and structural details. SAR, as a complementary perspective to other modalities, facilitates the utilization of spatial information for global land use and land cover (LULC). To address this gap, we introduce the Dynamic World+ dataset, expanding the current authoritative multispectral dataset, Dynamic World, with aligned SAR data. Additionally, to facilitate the combination of multispectral and SAR data, we propose a lightweight transformer architecture termed SpecSAR-Former. It incorporates two innovative modules, Dual Modal Enhancement Module (DMEM) and Mutual Modal Aggregation Module (MMAM), designed to exploit cross-information between the two modalities in a split-fusion manner. These modules enhance the model's ability to integrate spectral and spatial information, thereby improving the overall performance of global LULC semantic segmentation. Furthermore, we adopt an imbalanced parameter allocation strategy that assigns parameters to different modalities based on their importance and information density. Extensive experiments demonstrate that our network outperforms existing transformer and CNN-based models, achieving a mean Intersection over Union (mIoU) of 59.58%, an Overall Accuracy (OA) of 79.48%, and an F1 Score of 71.68% with only 26.70M parameters. The code will be available at this https URL.

Submitted: Oct 4, 2024