Paper ID: 2410.07111 • Published Sep 20, 2024
Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information
Choonghan Kim, Seonhee Cho, Joo Heung Yoon
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Background: Large language models (LLMs) are gaining use in clinical
settings, but their performance can suffer with incomplete radiology reports.
We tested whether multimodal LLMs (using text and images) could improve
accuracy and understanding in chest radiography reports, making them more
effective for clinical decision support.
Purpose: To assess the robustness of LLMs in generating accurate impressions
from chest radiography reports using both incomplete data and multimodal data.
Material and Methods: We used 300 radiology image-report pairs from the
MIMIC-CXR database. Three LLMs (OpenFlamingo, MedFlamingo, IDEFICS) were tested
in both text-only and multimodal formats. Impressions were first generated from
the full text, then tested by removing 20%, 50%, and 80% of the text. The
impact of adding images was evaluated using chest x-rays, and model performance
was compared using three metrics with statistical analysis.
Results: The text-only models (OpenFlamingo, MedFlamingo, IDEFICS) had
similar performance (ROUGE-L: 0.39 vs. 0.21 vs. 0.21; F1RadGraph: 0.34 vs. 0.17
vs. 0.17; F1CheXbert: 0.53 vs. 0.40 vs. 0.40), with OpenFlamingo performing
best on complete text (p<0.001). Performance declined with incomplete data
across all models. However, adding images significantly boosted the performance
of MedFlamingo and IDEFICS (p<0.001), equaling or surpassing OpenFlamingo, even
with incomplete text. Conclusion: LLMs may produce low-quality outputs with
incomplete radiology data, but multimodal LLMs can improve reliability and
support clinical decision-making.
Keywords: Large language model; multimodal; semantic analysis; Chest
Radiography; Clinical Decision Support;