Modality Competition

Modality competition in multimodal learning refers to the phenomenon where, during joint training of models processing multiple data types (e.g., audio and video), one modality overshadows others, hindering optimal performance. Current research focuses on mitigating this issue through novel training strategies, such as alternating learning paradigms and gradient modulation techniques, aiming to decouple modalities or balance their contributions. Overcoming modality competition is crucial for realizing the full potential of multimodal systems, improving accuracy and robustness in applications ranging from medical image analysis to sentiment analysis.

Papers