Multimodal Attention
Multimodal attention focuses on intelligently combining information from different data sources (e.g., text, images, audio) to improve the performance of machine learning models. Current research emphasizes developing sophisticated attention mechanisms, often within transformer-based architectures, to dynamically weigh the contribution of each modality and learn complex cross-modal relationships. This approach is proving highly effective across diverse applications, including improved accuracy in sentiment analysis, image fusion, and medical diagnosis, leading to more robust and informative models.
Papers
February 17, 2023
February 8, 2023
October 28, 2022
September 20, 2022
September 12, 2022
August 25, 2022
June 17, 2022
March 29, 2022
March 9, 2022
February 25, 2022
January 5, 2022
December 14, 2021
December 6, 2021
Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery
Qingyun Fang, Zhaokui Wang
MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering
Fangzhi Xu, Qika Lin, Jun Liu, Lingling Zhang, Tianzhe Zhao, Qi Chai, Yudai Pan
November 17, 2021