Multimodal Data
Multimodal data analysis focuses on integrating information from diverse sources like text, images, audio, and sensor data to achieve a more comprehensive understanding than any single modality allows. Current research emphasizes developing effective fusion techniques, often employing transformer-based architectures, variational autoencoders, or large language models to combine and interpret these heterogeneous data types for tasks ranging from sentiment analysis and medical image interpretation to financial forecasting and summarization. This field is significant because it enables more robust and accurate models across numerous applications, improving decision-making in areas like healthcare, finance, and environmental monitoring.
Papers
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun
SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets
Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury
ZS-MSTM: Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding
Mireille Fares, Catherine Pelachaud, Nicolas Obin
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement
De Cheng, Xiaojian Huang, Nannan Wang, Lingfeng He, Zhihui Li, Xinbo Gao
More Perspectives Mean Better: Underwater Target Recognition and Localization with Multimodal Data via Symbiotic Transformer and Multiview Regression
Shipei Liu, Xiaoya Fan, Guowei Wu