Speech Emotion Recognition
Speech emotion recognition (SER) aims to automatically identify human emotions from speech, primarily focusing on improving accuracy and robustness across diverse languages and contexts. Current research emphasizes leveraging self-supervised learning models, particularly transformer-based architectures, and exploring techniques like cross-lingual adaptation, multi-modal fusion (combining speech with text or visual data), and efficient model compression for resource-constrained environments. Advances in SER have significant implications for various applications, including mental health monitoring, human-computer interaction, and personalized healthcare, by enabling more natural and empathetic interactions between humans and machines.
Papers
Describing emotions with acoustic property prompts for speech emotion recognition
Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition
Jiaxin Ye, Xin-cheng Wen, Yujie Wei, Yong Xu, Kunhong Liu, Hongming Shan
Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset
Francesca Gasparini, Alessandra Grossi
Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller
Pretrained audio neural networks for Speech emotion recognition in Portuguese
Marcelo Matheus Gauy, Marcelo Finger
Fast Yet Effective Speech Emotion Recognition with Self-distillation
Zhao Ren, Thanh Tam Nguyen, Yi Chang, Björn W. Schuller
Effect of different splitting criteria on the performance of speech emotion recognition
Bagus Tris Atmaja, Akira Sasou