Cross Modal
Cross-modal research focuses on integrating information from different data modalities (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing robust model architectures, such as contrastive masked autoencoders, diffusion models, and transformers, to effectively align and fuse these diverse data types, often addressing challenges like modality gaps and missing data through techniques like multi-graph alignment and cross-modal contrastive learning. This field is significant because it enables more comprehensive and accurate analysis of complex data, with applications ranging from medical diagnosis and video generation to misinformation detection and person re-identification.
Papers
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William K. Cheung, Jia Liu, Jing Qin
RESTORE: Towards Feature Shift for Vision-Language Prompt Learning
Yuncheng Yang, Chuyan Zhang, Zuopeng Yang, Yuting Gao, Yulei Qin, Ke Li, Xing Sun, Jie Yang, Yun Gu
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
Tyler Benster, Guy Wilson, Reshef Elisha, Francis R Willett, Shaul Druckmann
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Jose Vargas Quiros, Chirag Raman, Stephanie Tan, Ekin Gedik, Laura Cabrera-Quiros, Hayley Hung