Multimodal Feature
Multimodal feature research focuses on integrating information from multiple data sources (e.g., text, images, audio) to create richer, more comprehensive representations for various tasks. Current research emphasizes effective fusion strategies, often employing attention mechanisms, transformers, and graph neural networks to capture inter- and intra-modal relationships, and addressing challenges like modality alignment and handling asynchronous data. This field is significant for improving the accuracy and robustness of applications across diverse domains, including medical diagnosis, emotion recognition, and fake news detection, by leveraging the complementary strengths of different data modalities.
Papers
A Dual Branch Network for Emotional Reaction Intensity Estimation
Jun Yu, Jichao Zhu, Wangyuan Zhu, Zhongpeng Cai, Guochen Xie, Renda Li, Gongpeng Zhao
Emotional Reaction Intensity Estimation Based on Multimodal Data
Shangfei Wang, Jiaqiang Wu, Feiyi Zheng, Xin Li, Xuewei Li, Suwen Wang, Yi Wu, Yanan Chang, Xiangyu Miao
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar, Gaurav Bhat, Omkar Ingle, Daksh Goyal, Balasubramanian Raman
Transformer-based Multimodal Information Fusion for Facial Expression Analysis
Wei Zhang, Feng Qiu, Suzhen Wang, Hao Zeng, Zhimeng Zhang, Rudong An, Bowen Ma, Yu Ding