Modality Bias
Modality bias, the tendency for multimodal models to over-rely on a single data modality (e.g., visual or textual information), hinders the effective integration of diverse information sources. Current research focuses on identifying and mitigating this bias through techniques like modality importance scoring, chain-of-thought prompting with large language models, and novel loss functions that encourage balanced modality utilization. Addressing modality bias is crucial for improving the robustness and accuracy of multimodal systems across various applications, including video question answering, pedestrian detection, and speech recognition, ultimately leading to more reliable and comprehensive AI systems.
Papers
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee
YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions
Yunhao Du, Zhicheng Zhao, Fei Su