Multimodal Fusion
Multimodal fusion integrates data from diverse sources (e.g., images, audio, text, sensor readings) to improve the accuracy and robustness of machine learning models across various applications. Current research emphasizes developing efficient fusion architectures, including transformers and graph convolutional networks, often incorporating attention mechanisms to weigh the contribution of different modalities and address issues like data sparsity and asynchrony. This field is significantly impacting diverse domains, from improving medical diagnoses and autonomous driving to enhancing human-computer interaction and e-commerce search results through more comprehensive and nuanced data analysis.
Papers
A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification
Ravi Datta Rachuri, Duoduo Liao, Samhita Sarikonda, Datha Vaishnavi Kondur
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma, Jiaqi Liu, Zhichao Lu