Dimensional Emotion Recognition

Dimensional emotion recognition aims to automatically assess continuous emotional states like valence and arousal from various modalities, such as audio, video, and text. Current research heavily utilizes deep learning architectures, particularly recurrent neural networks, transformers, and attention mechanisms (including cross-modal attention) to fuse information from multiple modalities and improve accuracy. This field is significant for advancing human-computer interaction, personalized experiences (e.g., music recommendation), and mental health applications by enabling more nuanced and accurate understanding of human affect.

Papers