Non Autoregressive Text to Speech
Non-autoregressive text-to-speech (TTS) aims to synthesize speech from text significantly faster than traditional autoregressive methods by generating the entire audio output in parallel. Current research focuses on improving the naturalness and speaker similarity of non-autoregressive TTS, employing techniques like diffusion models, masked generative transformers, and variational autoencoders to achieve this goal, often incorporating speaker embeddings and probabilistic duration modeling for enhanced control and realism. These advancements offer the potential for more efficient and versatile speech synthesis applications, particularly in real-time systems and those requiring diverse speaker voices.
Papers
October 29, 2024
October 9, 2024
September 14, 2024
September 1, 2024
July 1, 2024
June 8, 2024
June 4, 2024
January 5, 2024
July 19, 2023
June 2, 2023
February 10, 2023
July 13, 2022
May 9, 2022
April 8, 2022