Medical Vision Language Pre Training

Medical Vision-Language Pre-training (MedVLP) aims to create robust models that understand both medical images and associated text data by leveraging large datasets of paired image-text information. Current research focuses on improving the robustness of these models to diverse textual prompts, mitigating the effects of noisy data and adversarial attacks, and enhancing the alignment between visual and textual information through techniques like contrastive learning and knowledge augmentation using large language models. This field is significant because it promises to improve the efficiency and accuracy of medical image analysis, potentially leading to advancements in diagnosis, treatment planning, and other clinical applications.

Papers