Speaker Identity
Speaker identity, a crucial aspect of human communication, is the focus of ongoing research aiming to accurately model and manipulate this information within speech signals. Current efforts concentrate on disentangling speaker characteristics from other speech elements (like content and prosody) using techniques like variational autoencoders and contrastive learning, often within frameworks of voice conversion and anonymization. These advancements have significant implications for applications such as speaker recognition, voice privacy, and personalized speech technologies, improving both accuracy and robustness in these fields.
Papers
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R. J. J. H. van Son, Tomoki Toda
Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities
Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli