Image Modality
Image modality research focuses on understanding how different types of image data, alone or in combination with other modalities like text, can be effectively used for various tasks. Current research emphasizes improving the integration and fusion of information across modalities, often employing transformer-based architectures and techniques like knowledge distillation or Shapley value-based methods to enhance model interpretability and performance. This work is significant because it addresses limitations in existing models, such as hallucinations and a lack of compositional understanding, leading to more robust and reliable applications in diverse fields like medical image analysis, group activity recognition, and image generation.
Papers
Visual Perception in Text Strings
Qi Jia, Xiang Yue, Shanshan Huang, Ziheng Qin, Yizhu Liu, Bill Yuchen Lin, Yang You
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
Kenza Amara, Lukas Klein, Carsten Lüth, Paul Jäger, Hendrik Strobelt, Mennatallah El-Assady
SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion
Jun Wang, Yu Mao, Nan Guan, Chun Jason Xue