Modality Bias

Modality bias, the tendency for multimodal models to over-rely on a single data modality (e.g., visual or textual information), hinders the effective integration of diverse information sources. Current research focuses on identifying and mitigating this bias through techniques like modality importance scoring, chain-of-thought prompting with large language models, and novel loss functions that encourage balanced modality utilization. Addressing modality bias is crucial for improving the robustness and accuracy of multimodal systems across various applications, including video question answering, pedestrian detection, and speech recognition, ultimately leading to more reliable and comprehensive AI systems.

Papers