Cross Modal Attention
Cross-modal attention focuses on integrating information from multiple data sources (e.g., images, audio, text) to improve the performance of machine learning models. Current research emphasizes developing sophisticated attention mechanisms within transformer-based architectures to effectively fuse these heterogeneous modalities, often incorporating techniques like co-guidance attention, hierarchical attention, and contrastive learning to enhance feature representation and alignment. This approach is proving highly effective across diverse applications, including medical image analysis, audio-visual event localization, and deepfake detection, leading to improved accuracy and interpretability in these fields. The ability to effectively combine information from different modalities holds significant promise for advancing various scientific and technological domains.
Papers
SI-LSTM: Speaker Hybrid Long-short Term Memory and Cross Modal Attention for Emotion Recognition in Conversation
Xingwei Liang, You Zou, Ruifeng Xu
Learning Missing Modal Electronic Health Records with Unified Multi-modal Data Embedding and Modality-Aware Attention
Kwanhyung Lee, Soojeong Lee, Sangchul Hahn, Heejung Hyun, Edward Choi, Byungeun Ahn, Joohyung Lee