Speech Emotion Corpus

Speech emotion corpora are collections of audio recordings annotated with emotional labels, serving as crucial datasets for training and evaluating speech emotion recognition (SER) systems. Current research focuses on improving SER robustness across different datasets (cross-corpus SER) and handling out-of-domain scenarios, employing techniques like contrastive learning, transfer learning, and novel model architectures such as audio-conditioned language models and deep implicit distribution alignment networks. These advancements aim to create more accurate and generalizable SER systems with applications in various fields, including mental health assessment, human-computer interaction, and personalized user experiences.

Papers