Generative Speech

Generative speech research focuses on creating systems capable of producing realistic and controllable speech from various inputs, such as text or other audio. Current efforts concentrate on developing robust models, often leveraging neural codecs and large language models, to handle diverse tasks including text-to-speech, voice conversion, and speech enhancement, even in noisy conditions. These advancements are significant for applications ranging from personalized voice assistants and dubbing to improving accessibility for individuals with speech impairments, and also address concerns around the malicious use of synthetic speech through techniques like watermarking.

Papers