Paper ID: 2503.01210 • Published Mar 3, 2025
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, Risheng Liu
Dalian University of Technology
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Multi-modality image fusion, particularly infrared and visible, plays a
crucial role in integrating diverse modalities to enhance scene understanding.
Although early research prioritized visual quality, preserving fine details and
adapting to downstream tasks remains challenging. Recent approaches attempt
task-specific design but rarely achieve "The Best of Both Worlds" due to
inconsistent optimization goals. To address these issues, we propose a novel
method that leverages the semantic knowledge from the Segment Anything Model
(SAM) to Grow the quality of fusion results and Enable downstream task
adaptability, namely SAGE. Specifically, we design a Semantic Persistent
Attention (SPA) Module that efficiently maintains source information via the
persistent repository while extracting high-level semantic priors from SAM.
More importantly, to eliminate the impractical dependence on SAM during
inference, we introduce a bi-level optimization-driven distillation mechanism
with triplet losses, which allow the student network to effectively extract
knowledge. Extensive experiments show that our method achieves a balance
between high-quality visual results and downstream task adaptability while
maintaining practical deployment efficiency. The code is available at
this https URL
Figures & Tables
Unlock access to paper figures and tables to enhance your research experience.