Unimodal Bias
Unimodal bias in multimodal machine learning refers to the tendency of models to over-rely on a single input modality (e.g., text or images) while neglecting others, hindering accurate performance on tasks requiring integrated information from multiple sources. Current research focuses on identifying and mitigating this bias through techniques like causal analysis of model predictions, architectural modifications to control the fusion of modalities (e.g., varying the depth of fusion), and the development of more balanced benchmark datasets. Addressing unimodal bias is crucial for improving the reliability and robustness of multimodal models in applications such as misinformation detection and visual question answering, ultimately leading to more accurate and trustworthy AI systems.