Accent Transfer

Accent transfer in speech synthesis aims to imbue synthetic speech with a specific accent while preserving the target speaker's voice characteristics. Current research focuses on disentangling accent and speaker identity within speech data, often employing techniques like variational autoencoders (VAEs) and hierarchical models within end-to-end text-to-speech (TTS) frameworks such as VITS. These advancements leverage both real and synthetic training data, including data augmentation strategies, to improve the naturalness and accuracy of accent transfer in TTS systems. This work has significant implications for improving the realism and accessibility of speech technologies, particularly in multilingual and multicultural applications.

Papers