Emotional Speech
Emotional speech research focuses on automatically recognizing and synthesizing human emotions from spoken language, aiming to improve human-computer interaction and various applications requiring emotional intelligence. Current research emphasizes developing robust models, often employing deep learning architectures like convolutional neural networks (CNNs), transformers, and diffusion models, to handle the complexities of emotional expression in speech, including cross-domain learning from music and addressing challenges like noise and data scarcity. This field is significant for advancing our understanding of human emotion and its acoustic manifestations, with potential impacts on mental health assessment, customer service, and the development of more empathetic and natural-sounding AI systems.
Papers
Multi-Microphone and Multi-Modal Emotion Recognition in Reverbrant Enviroment
Ohad Cohen, Gershon Hazan, Sharon Gannot
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
Satvik Dixit, Daniel M. Low, Gasser Elbanna, Fabio Catania, Satrajit S. Ghosh
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee