Single Speaker

Single-speaker speech processing focuses on developing models and techniques that effectively analyze and synthesize speech from a single individual. Current research emphasizes leveraging large language models and deep learning architectures like Tacotron2 and VITS, often incorporating self-supervised learning and transfer learning strategies to improve efficiency and performance, particularly in low-resource scenarios. This work is crucial for advancing applications such as text-to-speech synthesis, speech recognition, and source separation, particularly in scenarios where obtaining large, multi-speaker datasets is challenging or impractical. The development of high-quality single-speaker models also serves as a foundation for more complex multi-speaker systems.

Papers