Multimodal Sentiment Analysis
Multimodal sentiment analysis (MSA) aims to understand human sentiment by integrating information from various sources like text, audio, and video. Current research emphasizes robust methods for handling incomplete data, improving the fusion of heterogeneous modalities (often prioritizing text), and mitigating biases inherent in datasets. This involves exploring advanced architectures like transformers and employing techniques such as contrastive learning, knowledge distillation, and attention mechanisms to enhance accuracy and interpretability. MSA's impact spans various fields, including human-computer interaction, social media analysis, and mental health monitoring, by enabling more nuanced and accurate understanding of emotional expression.
Papers
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis
Peipei Liu, Xin Zheng, Hong Li, Jie Liu, Yimo Ren, Hongsong Zhu, Limin Sun