Cross Modal
Cross-modal research focuses on integrating information from different data modalities (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing robust model architectures, such as contrastive masked autoencoders, diffusion models, and transformers, to effectively align and fuse these diverse data types, often addressing challenges like modality gaps and missing data through techniques like multi-graph alignment and cross-modal contrastive learning. This field is significant because it enables more comprehensive and accurate analysis of complex data, with applications ranging from medical diagnosis and video generation to misinformation detection and person re-identification.
Papers
DualKanbaFormer: Kolmogorov-Arnold Networks and State Space Model Transformer for Multimodal Aspect-based Sentiment Analysis
Adamu Lawan, Juhua Pu, Haruna Yunusa, Muhammad Lawan, Aliyu Umar, Adamu Sani Yahya
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning
Sakhinana Sagar Srinivas, Venkataramana Runkana
Feedback-based Modal Mutual Search for Attacking Vision-Language Pre-training Models
Renhua Ding, Xinze Zhang, Xiao Yang, Kun He
Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach
Zhe Fu, Kanlun Wang, Wangjiaxuan Xin, Lina Zhou, Shi Chen, Yaorong Ge, Daniel Janies, Dongsong Zhang
RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search
Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, X. Sean Wang
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen, Pengfei Zhang, Kai Ye, Wei Dong, Xin Feng, Yana Zhang