Speech Emotion Recognition
Speech emotion recognition (SER) aims to automatically identify human emotions from speech, primarily focusing on improving accuracy and robustness across diverse languages and contexts. Current research emphasizes leveraging self-supervised learning models, particularly transformer-based architectures, and exploring techniques like cross-lingual adaptation, multi-modal fusion (combining speech with text or visual data), and efficient model compression for resource-constrained environments. Advances in SER have significant implications for various applications, including mental health monitoring, human-computer interaction, and personalized healthcare, by enabling more natural and empathetic interactions between humans and machines.
Papers
Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller
Pretrained audio neural networks for Speech emotion recognition in Portuguese
Marcelo Matheus Gauy, Marcelo Finger
Fast Yet Effective Speech Emotion Recognition with Self-distillation
Zhao Ren, Thanh Tam Nguyen, Yi Chang, Björn W. Schuller
Effect of different splitting criteria on the performance of speech emotion recognition
Bagus Tris Atmaja, Akira Sasou
Two-stage dimensional emotion recognition by fusing predictions of acoustic and text networks using SVM
Bagus Tris Atmaja, Masato Akagi