Multimodal Emotion Recognition
Multimodal emotion recognition (MER) aims to accurately identify human emotions by integrating information from various sources like facial expressions, speech, and physiological signals. Current research heavily focuses on developing robust models that handle incomplete or noisy data, often employing transformer architectures, graph neural networks, and contrastive learning techniques to effectively fuse multimodal information and address class imbalances. These advancements are significant for improving human-computer interaction, mental health assessment, and other applications requiring nuanced understanding of emotional states. The field is also exploring open-vocabulary emotion recognition and the integration of large language models for more comprehensive and context-aware emotion analysis.
Papers
Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts
Harsh Agarwal, Keshav Bansal, Abhinav Joshi, Ashutosh Modi
LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences
Ziwang Fu, Feng Liu, Hanyang Wang, Siyuan Shen, Jiahao Zhang, Jiayin Qi, Xiangling Fu, Aimin Zhou