Multi Modal Fusion
Multi-modal fusion aims to integrate information from diverse data sources (e.g., images, text, sensor readings) to improve the accuracy and robustness of machine learning models. Current research heavily utilizes transformer architectures and attention mechanisms, along with innovative approaches like mixture-of-experts models and state space models, to effectively fuse data and address challenges such as missing modalities and noisy data. This field is crucial for advancing applications in various domains, including autonomous driving, medical diagnosis, and multimedia analysis, by enabling more comprehensive and reliable data interpretation than single-modality approaches. The development of efficient and interpretable fusion methods remains a key focus.
Papers
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation
Aniruddh Sikdar, Jayant Teotia, Suresh Sundaram
MMFusion: Combining Image Forensic Filters for Visual Manipulation Detection and Localization
Kostas Triaridis, Konstantinos Tsigos, Vasileios Mezaris