Cross Modal
Cross-modal research focuses on integrating information from different data modalities (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing robust model architectures, such as contrastive masked autoencoders, diffusion models, and transformers, to effectively align and fuse these diverse data types, often addressing challenges like modality gaps and missing data through techniques like multi-graph alignment and cross-modal contrastive learning. This field is significant because it enables more comprehensive and accurate analysis of complex data, with applications ranging from medical diagnosis and video generation to misinformation detection and person re-identification.
Papers
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
Adriel Saporta, Aahlad Puli, Mark Goldstein, Rajesh Ranganath
Text2Freq: Learning Series Patterns from Text via Frequency Domain
Ming-Chih Lo, Ching Chang, Wen-Chih Peng
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
Fuying Wang, Feng Wu, Yihan Tang, Lequan Yu
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model
Duy M. H. Nguyen, Nghiem T. Diep, Trung Q. Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zhou, Daniel Sonntag, Mathias Niepert
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
Zheng Zhang, Xu Yuan, Lei Zhu, Jingkuan Song, Liqiang Nie
C-MELT: Contrastive Enhanced Masked Auto-Encoders for ECG-Language Pre-Training
Manh Pham, Aaqib Saeed, Dong Ma