Tt System

Text-to-speech (TTS) systems aim to synthesize natural-sounding human speech from written text. Current research focuses on improving the quality and efficiency of TTS, particularly for longer passages, by incorporating contextual information across sentences and employing techniques like memory-cached recurrence and linearized self-attention within models such as VITS. This work is driven by the need for more expressive and computationally efficient TTS, with applications ranging from improved accessibility tools to advancements in speech synthesis for low-resource languages, as exemplified by efforts to expand corpora for languages like Kazakh.

Papers