Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to accurately identify human emotions by integrating information from various sources like facial expressions, speech, and physiological signals. Current research heavily focuses on developing robust models that handle incomplete or noisy data, often employing transformer architectures, graph neural networks, and contrastive learning techniques to effectively fuse multimodal information and address class imbalances. These advancements are significant for improving human-computer interaction, mental health assessment, and other applications requiring nuanced understanding of emotional states. The field is also exploring open-vocabulary emotion recognition and the integration of large language models for more comprehensive and context-aware emotion analysis.

Papers