Multi Speaker Tt
Multi-speaker text-to-speech (TTS) aims to synthesize high-quality speech from text for multiple speakers, often incorporating expressive prosody and style control. Current research focuses on improving model architectures like diffusion models and incorporating multi-modal prompts (e.g., text, images, reference audio) to enhance expressiveness and control over generated speech, while also addressing challenges like zero-shot speaker adaptation and robustness to imperfect transcriptions. Advances in this field are significant for applications ranging from personalized virtual assistants to accessible communication technologies, driving improvements in both the naturalness and diversity of synthetic speech.
Papers
December 17, 2023
August 3, 2023
June 20, 2023
May 18, 2023
February 27, 2023
October 25, 2022
October 12, 2022
July 13, 2022
July 3, 2022
June 27, 2022
June 21, 2022
February 7, 2022
January 19, 2022
December 4, 2021