Cross Modal
Cross-modal research focuses on integrating information from different data modalities (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing robust model architectures, such as contrastive masked autoencoders, diffusion models, and transformers, to effectively align and fuse these diverse data types, often addressing challenges like modality gaps and missing data through techniques like multi-graph alignment and cross-modal contrastive learning. This field is significant because it enables more comprehensive and accurate analysis of complex data, with applications ranging from medical diagnosis and video generation to misinformation detection and person re-identification.
Papers
Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning
Songning Lai, Jiakang Li, Guinan Guo, Xifeng Hu, Yulong Li, Yuan Tan, Zichen Song, Yutong Liu, Zhaoxia Ren, Chun Wan, Danmin Miao, Zhi Liu
Hierarchical Aligned Multimodal Learning for NER on Tweet Posts
Peipei Liu, Hong Li, Yimo Ren, Jie Liu, Shuaizong Si, Hongsong Zhu, Limin Sun
Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models
Abhishek Mandal, Susan Leavy, Suzanne Little
Deep Lifelong Cross-modal Hashing
Liming Xu, Hanqi Li, Bochuan Zheng, Weisheng Li, Jiancheng Lv
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang
Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model
Yizhou Zhang, Loc Trinh, Defu Cao, Zijun Cui, Yan Liu
CoVLR: Coordinating Cross-Modal Consistency and Intra-Modal Structure for Vision-Language Retrieval
Yang Yang, Zhongtian Fu, Xiangyu Wu, Wenjie Li