Speech Emotion

Speech emotion recognition (SER) aims to automatically identify the emotional content of spoken language, focusing on both categorical (e.g., happy, sad) and dimensional (e.g., valence, arousal) aspects. Current research emphasizes improving model robustness and generalization across languages and demographics, employing techniques like self-supervised learning, large language models (LLMs), and various deep learning architectures (e.g., CNNs, Transformers). Advances in SER have significant implications for improving human-computer interaction, particularly in applications requiring emotional intelligence, such as customer service, mental health monitoring, and personalized education.

Papers