Speaker Similarity

Speaker similarity research focuses on accurately representing and manipulating speaker characteristics in speech signals, primarily aiming to improve speech separation, voice conversion, and text-to-speech (TTS) systems. Current research emphasizes developing robust models, such as those based on transformers, normalizing flows, and diffusion models, that are less sensitive to variations in pitch and other speaker-specific features, even with limited training data. These advancements are crucial for enhancing the performance of various speech technologies, particularly in applications like multi-speaker speech recognition, personalized TTS, and voice cloning, where accurate speaker identification and differentiation are paramount.

Papers