Multi Speaker Text to Speech

Multi-speaker text-to-speech (TTS) aims to synthesize realistic speech from text for a variety of speakers, even those unseen during training. Current research focuses on improving the efficiency and quality of these systems, exploring techniques like frame selection, data augmentation with large language models, and the use of pre-trained models adapted via methods such as hypernetworks or contrastive learning. These advancements are significant because they address limitations in data availability and computational resources, paving the way for more versatile and accessible speech synthesis applications across diverse languages and speaker demographics.

Papers