Synthetic Emotional Speech

Synthetic emotional speech research aims to generate speech that convincingly conveys various emotions, enhancing human-computer interaction and addressing limitations in existing emotional datasets. Current efforts leverage deep learning models, including generative adversarial networks (GANs), diffusion models, and reinforcement learning approaches, often incorporating techniques like text-to-speech synthesis and the integration of multimodal data (text, images, and physiological signals). This field is significant for advancing speech emotion recognition systems, improving the realism of virtual agents and characters, and creating more engaging and expressive communication technologies.

Papers