Medical VQA
Medical Visual Question Answering (Med-VQA) aims to develop AI systems that can answer questions about medical images, assisting clinicians in diagnosis and decision-making. Current research focuses on improving model robustness and reliability through techniques like contrastive learning, adversarial training, and advanced attention mechanisms within large vision-language models (LVLMs). However, studies highlight significant limitations in current models' ability to handle nuanced medical questions, particularly those requiring fine-grained diagnostic reasoning, underscoring the need for more rigorous evaluation and improved model architectures. The ultimate goal is to create reliable and accurate Med-VQA systems that can enhance medical practice and improve patient care.
Papers
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, Huaxiu Yao
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos, Shan Chen, Siena Placino, Yingya Li, Juan Carlos Climent Pardo, Daphna Idan, Takeshi Tohyama, David Restrepo, Luis F. Nakayama, Jose M. M. Pascual-Leone, Guergana Savova, Hugo Aerts, Leo A. Celi, A. Ian Wong, Danielle S. Bitterman, Jack Gallifant