Speaker Generation
Speaker generation aims to synthesize realistic-sounding speech from nonexistent speakers, focusing on creating diverse and controllable voices. Current research emphasizes methods leveraging pre-trained models, such as text-to-speech systems, combined with techniques like attribute interpolation (e.g., model merging, optimal transport) and prompt-based control to manipulate speaker characteristics from text descriptions. This field is significant for applications in entertainment, accessibility technologies, and data augmentation, while also posing challenges in areas like deepfake detection and speaker de-identification.
Papers
June 30, 2024
June 13, 2024
February 13, 2024
October 8, 2023
June 2, 2023
October 18, 2022
September 9, 2022
March 31, 2022