Multi Modal Fusion
Multi-modal fusion aims to integrate information from diverse data sources (e.g., images, text, sensor readings) to improve the accuracy and robustness of machine learning models. Current research heavily utilizes transformer architectures and attention mechanisms, along with innovative approaches like mixture-of-experts models and state space models, to effectively fuse data and address challenges such as missing modalities and noisy data. This field is crucial for advancing applications in various domains, including autonomous driving, medical diagnosis, and multimedia analysis, by enabling more comprehensive and reliable data interpretation than single-modality approaches. The development of efficient and interpretable fusion methods remains a key focus.
Papers
MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions
Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong
MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng