Medical Visual Question Answering
Medical Visual Question Answering (Med-VQA) focuses on developing AI systems that can accurately answer questions about medical images, aiding in diagnosis and treatment. Current research emphasizes leveraging large vision-language models (LVLMs), often incorporating techniques like prompt engineering, self-supervised learning, and multimodal contrastive learning to improve accuracy and address issues like hallucinations and data scarcity. The field's significance lies in its potential to assist medical professionals by automating image interpretation, improving diagnostic efficiency, and facilitating more informed decision-making. However, robust evaluation methods and addressing biases in training data remain crucial challenges.
Papers
Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting
Chantal Pellegrini, Matthias Keicher, Ege Özsoy, Nassir Navab
Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering
Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong