Paper ID: 2410.23898
Temporal and Spatial Super Resolution with Latent Diffusion Model in Medical MRI images
Vishal Dubey
Super Resolution (SR) plays a critical role in computer vision, particularly in medical imaging, where hardware and acquisition time constraints often result in low spatial and temporal resolution. While diffusion models have been applied for both spatial and temporal SR, few studies have explored their use for joint spatial and temporal SR, particularly in medical imaging. In this work, we address this gap by proposing to use a Latent Diffusion Model (LDM) combined with a Vector Quantised GAN (VQGAN)-based encoder-decoder architecture for joint super resolution. We frame SR as an image denoising problem, focusing on improving both spatial and temporal resolution in medical images. Using the cardiac MRI dataset from the Data Science Bowl Cardiac Challenge, consisting of 2D cine images with a spatial resolution of 256x256 and 8-14 slices per time-step, we demonstrate the effectiveness of our approach. Our LDM model achieves Peak Signal to Noise Ratio (PSNR) of 30.37, Structural Similarity Index (SSIM) of 0.7580, and Learned Perceptual Image Patch Similarity (LPIPS) of 0.2756, outperforming simple baseline method by 5% in PSNR, 6.5% in SSIM, 39% in LPIPS. Our LDM model generates images with high fidelity and perceptual quality with 15 diffusion steps. These results suggest that LDMs hold promise for advancing super resolution in medical imaging, potentially enhancing diagnostic accuracy and patient outcomes. Code link is also shared.
Submitted: Oct 29, 2024