Cross Modal
Cross-modal research focuses on integrating information from different data modalities (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing robust model architectures, such as contrastive masked autoencoders, diffusion models, and transformers, to effectively align and fuse these diverse data types, often addressing challenges like modality gaps and missing data through techniques like multi-graph alignment and cross-modal contrastive learning. This field is significant because it enables more comprehensive and accurate analysis of complex data, with applications ranging from medical diagnosis and video generation to misinformation detection and person re-identification.
Papers
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Ziheng Zhou, Jinxing Zhou, Wei Qian, Shengeng Tang, Xiaojun Chang, Dan Guo
DuSSS: Dual Semantic Similarity-Supervised Vision-Language Model for Semi-Supervised Medical Image Segmentation
Qingtao Pan, Wenhao Qiao, Jingjiao Lou, Bing Ji, Shuo Li
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia
GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection
Lingzhi Shen, Yunfei Long, Xiaohao Cai, Imran Razzak, Guanming Chen, Kang Liu, Shoaib Jameel
Dynamic Modality-Camera Invariant Clustering for Unsupervised Visible-Infrared Person Re-identification
Yiming Yang, Weipeng Hu, Haifeng Hu