Paper ID: 2308.06603
LadleNet: A Two-Stage UNet for Infrared Image to Visible Image Translation Guided by Semantic Segmentation
Tonghui Zou, Lei Chen
The translation of thermal infrared (TIR) images into visible light (VI) images plays a critical role in enhancing model performance and generalization capability, particularly in various fields such as registration and fusion of TIR and VI images. However, current research in this field faces challenges of insufficiently realistic image quality after translation and the difficulty of existing models in adapting to unseen scenarios. In order to develop a more generalizable image translation architecture, we conducted an analysis of existing translation architectures. By exploring the interpretability of intermediate modalities in existing translation architectures, we found that the intermediate modality in the image translation process for street scene images essentially performs semantic segmentation, distinguishing street images based on background and foreground patterns before assigning color information. Based on these principles, we propose an improved algorithm based on U-net called LadleNet. This network utilizes a two-stage U-net concatenation structure, consisting of Handle and Bowl modules. The Handle module is responsible for constructing an abstract semantic space, while the Bowl module decodes the semantic space to obtain the mapped VI image. Due to the characteristic of semantic segmentation, the Handle module has strong extensibility. Therefore, we also propose LadleNet+, which replaces the Handle module in LadleNet with a pre-trained DeepLabv3+ network, enabling the model to have a more powerful capability in constructing semantic space. The proposed methods were trained and tested on the KAIST dataset, followed by quantitative and qualitative analysis. Compared to existing methods, LadleNet and LadleNet+ achieved an average improvement of 12.4% and 15.2% in SSIM metrics, and 37.9% and 50.6% in MS-SSIM metrics, respectively.
Submitted: Aug 12, 2023