Emotion Conversion
Speech emotion conversion aims to alter the emotional expression in a spoken utterance while preserving its linguistic content and speaker identity. Recent research heavily focuses on handling "in-the-wild" data (uncontrolled, non-parallel recordings) using generative models like diffusion models and variational autoencoders, often incorporating disentangled representations to separate speaker, lexical, and emotional information. These advancements, along with techniques employing reinforcement learning and diffeomorphic flows, are improving the naturalness and controllability of synthesized speech with targeted emotions, impacting fields like speech synthesis, affective computing, and accessibility technologies.
Papers
August 4, 2024
September 14, 2023
June 2, 2023
November 9, 2022
November 14, 2021