Paper ID: 2411.15490
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation
Junhyeok Lee, Yujin Oh, Dahyoun Lee, Hyon Keun Joh, Chul-Ho Sohn, Sung Hyun Baik, Cheol Kyu Jung, Jung Hyun Park, Kyu Sung Choi, Byung-Hoon Kim, Jong Chul Ye
Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models.
Submitted: Nov 23, 2024